Mixing SetEnvironmentVariable and getenv is asking for trouble, as we recently found to our dismay (and exasperation!). It took some serious debugging to figure out the actual problem – let’s see if you can figure it out.

Here’s some C++/CLI code that uses getenv to read the value of an environment variable and print it to the console.

   1: public ref class EnvironmentVariablePrinter
   2: {
   3:     public:
   4:         void PrintEnv(String ^key)
   5:         {
   6:              IntPtr ptr = Marshal::StringToHGlobalAnsi(key);
   7:              const char *ckey = (const char *)ptr.ToPointer();
   8:              const char *cval = getenv(ckey);
   9:              string val = cval == NULL ? "" : string(cval);
  10:  
  11:              cout << "Key is " << ckey << " and value is " << val;
  12:  
  13:              Marshal::FreeHGlobal(ptr);    
  14:         }
  15: };

PrintEnv merely converts the .NET string into an unmanaged one and calls getenv, printing the returned value to the console.

And here’s some C# code that tests the class.

   1: class Test
   2: {
   3:     const string key = "MyKey";
   4:     public static void Main()
   5:     {
   6:         Environment.SetEnvironmentVariable(key, "MyValue");
   7:         PrintValue();
   8:     }
   9:  
  10:     private static void PrintValue()
  11:     {
  12:         EnvironmentVariablePrinter er = new EnvironmentVariablePrinter();
  13:         er.PrintEnv(key);
  14:     }
  15: }

The code uses System.Environment.SetEnvironmentVariable to set the value of the variable and then calls the C++/CLI code to verify that it prints the correct value. And of course, being written in different languages, the two pieces of code must reside in different projects, say CPlusPlusLib.dll and ConsoleApplication.exe, with the latter referencing the former.

No surprises here - this works as expected and prints “Key is MyKey and value is MyValue”.

However, a seemingly harmless change breaks the code big time.

   1: class Test
   2: {
   3:     const string key = "MyKey";
   4:     public static void Main()
   5:     {
   6:         Environment.SetEnvironmentVariable(key, "MyValue");
   7:         EnvironmentVariablePrinter er = new EnvironmentVariablePrinter();
   8:         er.PrintEnv(key);
   9:     }
  10: }

All I’ve done is inlining of PrintValue, yet running this code prints “Key is MyKey and value is” – getenv is now returning NULL instead of “MyValue”.

It gets even more interesting.

   1: class Test
   2: {
   3:     const string key = "MyKey";
   4:     public static void Main()
   5:     {
   6:         Environment.SetEnvironmentVariable(key, "MyValue");
   7:         EnvironmentVariablePrinter er = null;
   8:         PrintValue(); 
   9:     }
  10:  
  11:     private static void PrintValue()
  12:     {
  13:         EnvironmentVariablePrinter er = new EnvironmentVariablePrinter();
  14:         er.PrintEnv(key);
  15:     }
  16: }

This doesn’t work either – the mere declaration of EnvironmentVariablePrinter inside Main makes getenv return NULL for “MyKey”. This can’t be good, can it?

Actually yes, because that is a valuable clue – it means the change in behavior has something to do with JITting. As long as all code that references EnvironmentVariablePrinter is in a separate method, everything works fine (in debug mode, atleast). Things start going south when any such code is in Main itself.

What would the JITter do differently in the two scenarios? Load the assembly containing EnvironmentVariablePrinter (CPlusPlusLib.dll) at different times, of course. When all code referencing EnvironmmentVariablePrinter is inside PrintValue, it will have to load the DLL only when JITting PrintValue, whereas in the other case, it will have to load it when JITting Main. JITting Main obviously occurs much before JITting PrintValue, so DLL load time (relative to other code) is one big difference between the two scenarios that occurs because of JITting.

Why would loading CPlusPlusLib.dll a little early make getenv return NULL?

To understand that, you’ll first have to know how the getenv function works. Windows has APIs to set and get environment variables (SetEnvironmentVariable/GetEnvironmentVariable), and the .NET method P/Invokes into the Windows API to set and get values. getenv, on the other hand, is a CRT function, and does not delegate to the Windows API. Instead, the CRT gets all environment variables and their values when it is starting up (using the GetEnvironmentStrings Windows API), and copies them into its own data structures (MSVCR80!environ). getenv then works on the copied data from then on.

Now do you see the problem? When CPlusPlusLib.dll is loaded early (when JITting Main), the CRT also gets loaded as one of its dependencies, and the startup code that copies environment variables runs right away. At that point, Main hasn’t even been JITted yet, so there’s no way our call to System.Environment.SetEnvironmentVariable could have run by that time. And when it actually runs, it’s too late – the CRT environ block would have been updated much earlier, and calling the Windows API’s SetEnvironmentVariable wouldn’t have any effect on the cached values. When getenv runs, it looks in the cached values and returns NULL.

It’s easy to see why it works in the first case now – CRT loading occurs when JITting PrintValue, and that occurs after our call to SetEnvironmentVariable has executed. Which means that when it calls GetEnvironmentStrings as part of startup, it gets the variable (and its value) that we just set.

Nasty, ain’t it? The actual scenario was a lot more messy – things suddenly stopped working when we linked to a DLL ported to VS2008. We actually figured the problem backwards – we first saw that the CRT load time was different, theorized how getenv works, verified the theory by stepping through the assembly code and looking at the environ block, and once we realized the problem, figured out what was causing early loading of the CRT. Windbg was awesome for debugging this - things would have been very difficult if not for sxe ld:MSVCR90 and x MSVCR80!environ.

The fix was rather simple - in our case, we merely had to move code that set environment variable before CRT load. There’s another twist though; mscorwks.dll, which is the heart of the CLR, loads MSVCR80 when it loads, and you can’t set your environment variables before that, not from managed code anyway. Fortunately, in our case, the getenv call is from a library that links to MSVCR90, so as long as we set the environment variable before that version of the CRT loads, we’re good to go. Until the CLR gets linked to MSVCR90, anyway :).

Attach to what? To the process you want to debug, of course.

How many developers attach to and debug arbitrary processes running on their machines? Very few, I’d imagine. And I’d think that even those few people typically prefer Windbg or an equivalent debugger to the one supplied with Visual Studio.

Which means that aside from ASP .NET developers, almost all of us using Visual Studio attach to the application that we are working on and nothing else. To attach

1. You summon the “Attach to Process” dialog by hitting Ctrl + Alt + P. Or if you are a mouse person, you go to Debug –> Attach to Process.

2. You search for your application in the list of processes shown and select it.

3. You optionally change some settings and then hit OK.

The second step can be particularly annoying, especially if the name of your application starts with a particularly common letter that occurs in the latter half of the English alphabet (it’s a tie between ‘s’ and ‘v’ on my machine). Even otherwise, if you’ve worked on the application for a significant amount of time, the keystrokes to select it becomes part of muscle memory (down arrow, down arrow, Enter, for e.g.,), and you occasionally end up attaching to the wrong application because some other process sneaked in. Surely there must be a better way?

Enter JustAttach – a macro that does just that. It finds out the output file path of the startup project of your solution and automatically attaches to it.

The full macro code is at the end of the blog post. You can also download, unzip, open Macro Explorer (View –> Other Windows –> Macro Explorer) and select Load Macro Project to start using it right away.

Do your fingers a favor by binding the command to a VS shortcut (Tools –> Options –> Keyboard, type JustAttach in the textbox and choose a shortcut like Ctrl + Alt+ Y) ; your fingers will thank you for it :).

   1: Imports System
   2: Imports EnvDTE
   3: Imports EnvDTE80
   4: Imports EnvDTE90
   5: Imports System.Diagnostics
   6:  
   7: Public Module SenthilMacros
   8:     Public Sub JustAttach()
   9:         Dim solutionBuild As SolutionBuild = DTE.Solution.SolutionBuild
  10:         Dim startupProjectName As String = solutionBuild.StartupProjects(0)
  11:  
  12:         If String.IsNullOrEmpty(startupProjectName) Then
  13:             MsgBox("Could not attach because the startup project could not be determined", MsgBoxStyle.Critical, "Failed to Attach")
  14:             Return
  15:         End If
  16:  
  17:         Dim startupProject As Project = FindProject(startupProjectName.Trim())
  18:         Dim outputFilePath As String = FindOutputFileForProject(startupProject)
  19:  
  20:         If String.IsNullOrEmpty(outputFilePath) Then
  21:             MsgBox("Could not attach because output file path for the startup project could not be determined", MsgBoxStyle.Critical, "Failed to Attach")
  22:             Return
  23:         End If
  24:  
  25:         Attach(outputFilePath)
  26:     End Sub
  27:     Sub Attach(ByVal file As String)
  28:         Dim process As EnvDTE.Process
  29:  
  30:         For Each process In DTE.Debugger.LocalProcesses
  31:             If process.Name = file Then
  32:                 process.Attach()
  33:                 Return
  34:             End If
  35:         Next
  36:  
  37:         MsgBox("Could not attach because " + file + " is not found in the list of running processes", MsgBoxStyle.Critical, "Failed to Attach")
  38:     End Sub
  39:  
  40:     Function FindProject(ByVal projectName As String) As Project
  41:         Dim project As Project
  42:         For Each project In DTE.Solution.Projects
  43:             If project.UniqueName = projectName Then
  44:                 Return project
  45:             End If
  46:         Next
  47:     End Function
  48:     Function FindOutputFileForProject(ByVal project As Project) As String
  49:         Dim fileName As String = project.Properties.Item("OutputFileName").Value.ToString()
  50:         Dim projectPath As String = project.Properties.Item("LocalPath").Value.ToString()
  51:  
  52:         Dim config As Configuration = project.ConfigurationManager.ActiveConfiguration
  53:         Dim buildPath = config.Properties.Item("OutputPath").Value.ToString()
  54:  
  55:         If String.IsNullOrEmpty(fileName) Or String.IsNullOrEmpty(projectPath) Then
  56:             Return ""
  57:         End If
  58:  
  59:         Dim folderPath As String = System.IO.Path.Combine(projectPath, buildPath)
  60:         Return System.IO.Path.Combine(folderPath, fileName)
  61:  
  62:     End Function
  63: End Module
  64:  
Posted by Senthil | with no comments

The previous post discussed having anonymous methods as event handlers and ended with a question – why doesn’t unsubscription work while subscription works out alright?

Vivek got the answer spot on – the way the C# compiler handles and translates anonymous methods is the reason.

Here’s the code involved.

   1: public void Initialize()
   2: {
   3:     control.KeyPressed += IfEnabledThenDo(control_KeyPressed);
   4:     control.MouseMoved += IfEnabledThenDo(control_MouseMoved);
   5: }
   6:  
   7: public void Destroy()
   8: {
   9:     control.KeyPressed -= IfEnabledThenDo(control_KeyPressed);
  10:     control.MouseMoved -= IfEnabledThenDo(control_MouseMoved);
  11: }
  12:  
  13: public EventHandler<Control.ControlEventArgs> IfEnabledThenDo(EventHandler<Control.ControlEventArgs> actualAction)
  14: {
  15:     return (sender, args) => { if (args.Control.Enabled) actualAction(sender, args); };
  16: }

The compiler translates IfEnabledThenDo into this

   1: public EventHandler<Control.ControlEventArgs> IfEnabledThenDo(EventHandler<Control.ControlEventArgs> actualAction)
   2: {
   3:     <>c__DisplayClass1 CS$<>8__locals2 = new <>c__DisplayClass1();
   4:     CS$<>8__locals2.actualAction = actualAction;
   5:     return new EventHandler<Control.ControlEventArgs>(CS$<>8__locals2.<IfEnabledThenDo>b__0);
   6: }

Now the problem should be fairly obvious – every time the function is called, a new object gets created, and the event handler returned actually refers to a method (<IfEnabledThenDo>b__0) on the new instance. And that’s what breaks unsubscription. –= will not remove a delegate of a different instance of the same class from the invocation list – if it did, the consequences would not be pleasant if multiple instances of the same class subscribe to an event.

But why does the compiler translate our lambda expression this way? Raymond Chen has a great blog post explaining why, but the short answer is that it is needed to “hold” actualAction (the method parameter to IfEnabledThenDo) so that it is available when the event handler actually executes.

Now that we know why, the way to get around this issue is to cache the delegate instance returned by IfEnabledThenDo and use the same instance for subscription and unsubscription.

   1: EventHandler<Control.ControlEventArgs> keyPressed;
   2: EventHandler<Control.ControlEventArgs> mouseMoved;
   3:  
   4: public void Initialize()
   5: {
   6:    keyPressed = IfEnabledThenDo(control_KeyPressed);
   7:    mouseMoved = IfEnabledThenDo(control_MouseMoved);
   8:  
   9:    control.KeyPressed += keyPressed;
  10:    control.MouseMoved += mouseMoved;
  11: }        
  12:  
  13: public void Destroy()
  14: {
  15:    control.KeyPressed -= keyPressed;
  16:    control.MouseMoved -= mouseMoved;
  17: }

Knowing how things work under the hood has its advantages, I guess :)

 

PS : A very small syntactic change to the original example would have made the code work right away. If you’ve followed along this far, you should be able to figure out why.

   1: public void Initialize()
   2: {
   3:    actualAction = control_KeyPressed;
   4:    control.KeyPressed += IfEnabledThenDo();
   5: }        
   6:  
   7: public void Destroy()
   8: {
   9:     control.KeyPressed -= IfEnabledThenDo();
  10: }
  11:  
  12: EventHandler<Control.ControlEventArgs> actualAction;
  13: public EventHandler<Control.ControlEventArgs> IfEnabledThenDo()
  14: {
  15:     return (sender, args) => { if (args.Control.Enabled) actualAction(sender, args); };
  16: }
Posted by Senthil | 2 comment(s)
Filed under: , , ,

The syntactic sugar offered by anonymous methods makes them great candidates for writing event handlers; together with smart type inference, they reduce the amount of code written by an order of magnitude.

And that’s without considering the power offered by closures. With event handlers, closures allow you to kind of “stuff” extra parameters into the handler, without changing the actual number of formal parameters. This blog post shows a situation where an anonymous method acting as an event handler makes code simpler, and then goes on to show a gotcha with un-subscription and anonymous methods.

Here’s a simple Control class that fires a bunch of events.

   1: class Control
   2: {
   3:     public class ControlEventArgs : EventArgs
   4:     {
   5:         public Control Control {get;set;}
   6:     }
   7:  
   8:     public bool Enabled { get; set; }
   9:  
  10:     public event EventHandler<ControlEventArgs> KeyPressed;
  11:     public event EventHandler<ControlEventArgs> LeftButtonClicked;
  12:     public event EventHandler<ControlEventArgs> RightButtonClicked;
  13:     public event EventHandler<ControlEventArgs> MouseMoved;
  14: }

Let’s say you’re developing a GUI application with this class, and you want to handle events only if the originating control is visually enabled i.e., Enabled set to true. Pretty reasonable constraint, but tedious to implement, if you go the standard way of adding the check to the start of each of your event handlers.

   1: class GUIApp
   2: {
   3:     public void Initialize()
   4:     {
   5:         Control control = new Control();
   6:         control.KeyPressed += new EventHandler<Control.ControlEventArgs>(control_KeyPressed);
   7:         control.MouseMoved += new EventHandler<Control.ControlEventArgs>(control_MouseMoved);
   8:     }
   9:  
  10:     void control_MouseMoved(object sender, Control.ControlEventArgs e)
  11:     {
  12:         if (e.Control.Enabled)
  13:         {
  14:             /// 
  15:         }
  16:     }
  17:  
  18:     void control_KeyPressed(object sender, Control.ControlEventArgsEventArgs e)
  19:     {
  20:         if (e.Control.Enabled)
  21:         {
  22:             ///
  23:         }
  24:     }
  25: }

With an anonymous method, you could write a far more terse and easy to maintain version

   1: class GUIApp
   2: {
   3:     public void Initialize()
   4:     {
   5:         Control control = new Control();
   6:         control.KeyPressed += IfEnabledThenDo(control_KeyPressed);
   7:         control.MouseMoved += IfEnabledThenDo(control_MouseMoved);
   8:     }
   9:  
  10:     public EventHandler<Control.ControlEventArgs> IfEnabledThenDo(EventHandler<Control.ControlEventArgs> actualAction)
  11:     {
  12:         return (sender, args) => { if (args.Control.Enabled) { actualAction(sender, args); } };
  13:     }
  14:  
  15:     void control_MouseMoved(object sender, Control.ControlEventArgs e)
  16:     {
  17:         ///
  18:     }
  19:  
  20:     void control_KeyPressed(object sender, Control.ControlEventArgs e)
  21:     {
  22:         ///
  23:     }
  24: }

IfEnabledThenDo returns an anonymous function that first checks whether the control is enabled before calling the actual function. The code is much shorter, and the condition is checked only in one place, which makes it easy to modify or add additional logic without having to remember to change every single event handler. Plus, the reads like an English statement – subscribe to the event and if enabled, then do whatever else is necessary.

Great, but unless you are a masochist who revels in littering the code base with hard to reproduce bugs that bomb your app only when demoing to your most important customer, you must, of course, write code to unsubscribe. But there’s no method name to refer to, so you do it the same way as you did when subscribing.

   1: public void Initialize()
   2: {
   3:     control.KeyPressed += IfEnabledThenDo(control_KeyPressed);
   4:     control.MouseMoved += IfEnabledThenDo(control_MouseMoved);
   5: }
   6:  
   7: public void Destroy()
   8: {
   9:     control.KeyPressed -= IfEnabledThenDo(control_KeyPressed);
  10:     control.MouseMoved -= IfEnabledThenDo(control_MouseMoved);
  11: }

This, unfortunately, won’t work – the application will still remain subscribed to those events. Can you figure out why?

Answer and more in the next blog post.

Posted by Senthil | 6 comment(s)
Filed under: , , ,

You're writing this really cool and innovative class to calculate the first hundred thousand natural numbers. You think about the API, and you realize that returning an array of the numbers a) might take a long time to complete, and b) is going to cause memory usage to spike up like mad.

So you decide to stream them instead, returning an IEnumerable<T> instance instead of an array. Being a crack C# developer, you use the incredibly powerful yield keyword, rather than rolling your own implementation of the IEnumerable<T> interface.

   1: class MyCoolMathEngine
   2: {
   3:     public IEnumerable<int> GetFirstHundredThousandNaturalNumbers()
   4:     {
   5:         for (int i = 0; i < 100000; ++i)
   6:             yield return i;
   7:     }
   8: }

People start downloading your class and it becomes so popular that they want you to host it in an external service, as a remote component.

Fine, you say, and you pick .NET remoting to provide remote access. You pat yourself on the back for thinking ahead and using an enumerator model – it scales rather nicely. You write the plumbing code to host the class, and with a victorious smile on your lips, you write a test client that instantiates and accesses the method.

Only to find that it crashes with a System.Runtime.Serialization.SerializationException that says a type that you didn’t even write is not marked serializable.

That’s when it hits you, or at least that’s when it hit me, when I was modifying FinalizeTypeFinder to get it load on a different AppDomain.

Because yield return was used, what actually gets back to the caller is an instance of a compiler generated class that implements IEnumerable<int>, and that autogenerated class is not marked serializable. Nor does it derive from MarshalByRefObject, so there’s no way it can be remoted.

The only real solution I can think of is to write a custom implementation of IEnumerable<T>, like we had to do back in the C# 1.1 days. Yet another instance where compiler magic doesn’t quite work out in a particular scenario, I suppose.

Posted by Senthil | with no comments
Filed under: , ,

My colleague Soundar discovered this rather interesting behavior.

  1: class Test
  2: {
  3:     public static void Main()
  4:     {
  5:         Test test = null;
  6: 
  7:         Console.WriteLine("{0}", test);
  8:         Console.WriteLine("{0}", null);
  9:     }
 10: }

If you run this code, you’ll find that while line 7 prints an empty line, line 8 causes an ArgumentNullException. Note that the test reference is also null, so it should certainly surprise you that the two lines result in different behavior at runtime.

It certainly surprised me enough to make me dig deeper into the reason for the difference. I reasoned that given that the parameter values are identical at runtime, the discrepancy must happen because of a compiler operation – probably method overloading. And sure enough, the two calls resolve to different overloads.

Line 7 resolves to

public static void WriteLine(string format, object arg0);

whereas Line 8 resolves to

public static void WriteLine(string format, params object[] arg);

A peek at the source code using Reflector showed that the second overload throws if arg is null, whereas the first one packs arg0 into an object array and calls the second overload.

Ok, but why did the compiler pick different overloads?

Intuitively, for a method call with a single parameter, you’d expect the overload resolution algorithm to choose a single parameter method over a method with variable number of arguments. And that’s what the compiler did on line 7.

On line 8, the situation is different – null is directly assignable to arg0 and to arg. The overload resolution algorithm had to choose the best function, and it chose the one with the object array.

That appears counter intuitive, until you have code like

  1: class Test
  2: {
  3:     public static void Main()
  4:     {
  5:         SubTest subTest = null;
  6:         M(subTest);
  7:     }
  8: 
  9:     static void M(Test t) { Console.WriteLine("Test"); }
 10:     static void M(SubTest s) { Console.WriteLine("SubTest"); }
 11: }
 12: 
 13: class SubTest : Test { }

You wouldn’t be surprised if the call at line 6 resolved to M(SubTest), would you?

The C# spec’s rules for determining the best match say that

“ Given an implicit conversion C1 that converts from a type S to a type T1, and an implicit conversion C2 that converts from a type S to a type T2, the better conversion of the two conversions is determined as follows:

  • If T1 and T2 are the same type, neither conversion is better.
  • If S is T1, C1 is the better conversion.
  • If S is T2, C2 is the better conversion.
  • If an implicit conversion from T1 to T2 exists, and no implicit conversion from T2 to T1 exists, C1 is the better conversion.
  • If an implicit conversion from T2 to T1 exists, and no implicit conversion from T1 to T2 exists, C2 is the better conversion.
  • … “

In this case, SubTest (T1) is implicitly convertible to Test (T2), and therefore the compiler picks M(SubTest).

Now in our case, the compiler was trying to pick the best conversion between null to object and null to object[]. Applying the same rule as above, object[] is implicitly convertible to object, and therefore the overload resolution algorithm chose  WriteLine(string format, params object[] arg). The params modifier didn’t play a part in overload resolution in this (null) case.

Interesting, ain’t it?

Posted by Senthil | 4 comment(s)
Filed under: , ,

If you’ve done any multithreading programming at all, you must be aware of the volatile modifier. When a field is marked volatile, it tells

1. the JIT compiler that it can’t hoist the field because it may be modified by multiple threads

2. the CLR that the field must be read to and written from with acquire and release semantics.

Given what you’ve read above, the post’s title doesn’t make sense. A local variable, by definition, cannot be accessed from multiple threads. An object referred to by a local variable can be shared among threads, but never the variable itself.

Well, that was true as long as local variables remained just that – local variables. The 2.0 release of C# brought closures to the language, and C# implements capturing of local variables by making them members of a generated class. Now do you see the problem?

  1: public static void Main()
  2: {
  3:     bool stopRunning = false;
  4: 
  5:     Thread t = new Thread(() =>
  6:         {
  7:             while (!stopRunning)
  8:                 Console.WriteLine("Hello");
  9:         });
 10:     t.Start();
 11:     DoSomethingElse();
 12:     stopRunning = true;
 13: }

Nothing out of the ordinary here – I’m creating a thread, passing a lambda to the Thread constructor, and capturing stopRunning inside the lambda.

This code isn’t correct though – for the reasons mentioned in the initial paragraph of this post, stopRunning needs to be declared with the volatile modifier. Unfortunately, you can’t make stopRunning volatile – the compiler complains that local variables cannot be marked volatile.

Oops.

Making stopRunning a member of the class will solve the immediate problem – you can then mark the field volatile, and all is good. However, left at that, it now makes the class non-threadsafe – two threads could call Main, and stopRunning will be shared between them.

I guess this is the price to pay for compiler magic – magic that enables seamless access to local variables from anonymous methods.

Posted by Senthil | 5 comment(s)
Filed under: ,

ForEachMethodInFile is a Visual Studio macro that lets you do custom actions for each method defined in the current file. I’ve used it in the past to generate logging code to log the start and end of each method, to generate default error handling code etc..  And today someone over at CodeProject wanted to generate a breakpoint at the start of each method. The common thread here is the performing of some custom action for each method in the current file. I figured it would be useful for a lot of people if the “for each method in file” logic is available as a separate library, and that is what is ForEachMethodInFile.

The macro project is available here. To provide a custom action, all you have to do is create a new macro project, and do

  1: Imports System
  2: Imports EnvDTE
  3: Imports EnvDTE80
  4: Imports EnvDTE90
  5: Imports System.Diagnostics
  6: Imports ForEachMethodInFile
  7: 
  8: Public Module BreakOnEachMethodEntry
  9: 
 10:     Public Sub Run()
 11:         ForEachMethod.DoAction(AddressOf AddBreakPoint)
 12:     End Sub
 13: 
 14:     Function AddBreakPoint(ByVal method As CodeFunction)
 15:         DTE.Debugger.Breakpoints.Add(method.Name)
 16:     End Function
 17: 
 18: End Module

You define a callback function that takes the COM object representing a method as a parameter, and then pass on the function to the DoAction method. Your function will be called for each method defined in the file, and the cursor will be positioned at the start of the method before the callback occurs. In this case, I want to add a breakpoint, so I call the appropriate DTE method, passing the current method’s name to tell it where to place the breakpoint.

Here’s the entire code in the ForEachMethodInFile project – it basically recursively traverses code elements in the file, and invokes the callback when it runs into a method.

  1: Imports System
  2: Imports EnvDTE
  3: Imports EnvDTE80
  4: Imports System.Diagnostics
  5: 
  6: Public Module ForEachMethod
  7: 
  8:     Public Sub DoAction(ByRef action As Action(Of CodeFunction))
  9:         ProcessFile(action)
 10:     End Sub
 11: 
 12:     Function ProcessFile(ByRef action As Action(Of CodeFunction))
 13:         Dim selection As EnvDTE.TextSelection
 14:         Dim projectItem As ProjectItem
 15:         Dim fileCodeModel As FileCodeModel
 16:         Dim codeElement As CodeElement
 17:         Dim i As Integer
 18: 
 19:         Dim currentFunction As CodeFunction
 20: 
 21:         projectItem = DTE.ActiveDocument.ProjectItem
 22: 
 23:         fileCodeModel = projectItem.FileCodeModel
 24:         For i = 1 To fileCodeModel.CodeElements.Count
 25:             codeElement = fileCodeModel.CodeElements.Item(i)
 26:             ProcessCodeElement(codeElement, action)
 27:         Next
 28: 
 29:         ' Reformat the modified code
 30:         selection = DTE.ActiveDocument.Selection
 31:         selection.SelectAll()
 32:         selection.SmartFormat()
 33:     End Function
 34: 
 35:     Sub ProcessNamespace(ByVal namespaceElement As CodeNamespace, ByRef action As Action(Of CodeFunction))
 36:         Dim i As Integer
 37:         Dim codeElement As CodeElement
 38: 
 39:         For i = 1 To namespaceElement.Members.Count
 40:             codeElement = namespaceElement.Members.Item(i)
 41:             ProcessCodeElement(codeElement, action)
 42:         Next
 43:     End Sub
 44: 
 45:     Sub ProcessCodeElement(ByVal codeElement As CodeElement, ByRef action As Action(Of CodeFunction))
 46:         If codeElement.Kind = vsCMElement.vsCMElementNamespace Then
 47:             ProcessNamespace(codeElement, action)
 48:         ElseIf codeElement.Kind = vsCMElement.vsCMElementClass Then
 49:             ProcessType(codeElement, action)
 50:         ElseIf codeElement.Kind = vsCMElement.vsCMElementFunction Then
 51:             ProcessMethod(codeElement, action)
 52:         End If
 53:     End Sub
 54: 
 55:     Sub ProcessType(ByVal typeElement As CodeClass, ByRef action As Action(Of CodeFunction))
 56:         Dim i As Integer
 57:         Dim codeElement As CodeElement
 58: 
 59:         For i = 1 To typeElement.Members.Count
 60:             codeElement = typeElement.Members.Item(i)
 61:             If codeElement.Kind = vsCMElement.vsCMElementFunction Then
 62:                 ProcessMethod(codeElement, action)
 63:             ElseIf codeElement.Kind = vsCMElement.vsCMElementClass Then
 64:                 ProcessType(codeElement, action)
 65:             End If
 66:         Next
 67:     End Sub
 68: 
 69:     Sub ProcessMethod(ByVal methodElement As CodeFunction, ByRef action As Action(Of CodeFunction))
 70:         Dim selection As EnvDTE.TextSelection
 71:         Dim editPoint As EnvDTE.EditPoint
 72:         Dim verifyPoint As EnvDTE.TextPoint
 73:         Dim endPointAbsCharOffset As Integer
 74:         Dim column As Integer
 75:         Dim methodRunNotifierSignature As String
 76:         Dim functionStartCode As String
 77:         Dim functionEndCode As String
 78:         Dim parameters As String
 79:         Dim parameter As EnvDTE80.CodeParameter2
 80:         Dim i As Integer
 81: 
 82:         If methodElement.MustImplement Then
 83:             Return
 84:         End If
 85: 
 86:         selection = DTE.ActiveDocument.Selection
 87:         editPoint = selection.ActivePoint.CreateEditPoint()
 88:         verifyPoint = selection.ActivePoint.CreateEditPoint()
 89: 
 90:         ' Move to start of method
 91:         editPoint.MoveToPoint(methodElement.GetStartPoint(vsCMPart.vsCMPartBody))
 92:         selection.MoveToPoint(editPoint)
 93:         verifyPoint.MoveToPoint(methodElement.GetStartPoint(vsCMPart.vsCMPartBody))
 94: 
 95:         action(methodElement)
 96: 
 97:     End Sub
 98: End Module
 99: 
100: 
Posted by Senthil | with no comments

WinMacro is a tiny little application that can record and replay keyboard and mouse actions that you do on your Windows desktop. It’s similar to the macro facility in Word and Excel, but works across applications.

I wrote the initial version nearly 5 years back, and it proved to be very popular, with approximately 30000 downloads and tons of email from users. Most of the emails were appreciative, and some of them touching, especially those that described how WinMacro was helping people do a better job in fields ranging from cancer research to network testing.

There were quite a few feature requests and bug reports too. WinMacro 2.0 (Beta) (http://winmacro.codeplex.com/) attempts to addresses some of them. The list of new features is available in the download page.

That apart, it was “interesting” to look at code I’d written 5 years ago. I found it positively revolting, to say the least. Lots of copy pasted code, no error handling and plenty of global variables and convoluted code made it scored very heavily in the WTF scale. And this was code I was rather proud of, at that time.

On the flip side, the fact that I found my old code disgusting means I’ve improved my coding skills enough to make that code look terrible. But that is still relative improvement though, it remains to be seen how much I score on the absolute WTF scale, if there is one :)

Posted by Senthil | with no comments

In the previous blog post, we found that mutating a struct inside a class works if the struct is declared as a field, but doesn’t work if it is declared as a property.

The reason is fairly obvious – if struct fields also returned a copy, then there wouldn’t be any way of mutating the instance at all, even from within the declaring class.

  1: struct S
  2: {
  3:     public int X;
  4: }
  5: 
  6: class C
  7: {
  8:     public S S;
  9: 
 10:     void SetX()
 11:     {
 12:         this.S.X = 1; // Won't work if this.S returned a copy
 13:     }
 14: }

S would essentially act like a readonly field, except that you can’t change it even from within the constructor.

With that out of the way, let’s see how field access works under the covers – here’s the generated IL.

  1: .method public hidebysig static void Main() cil managed
  2: {
  3:     .entrypoint
  4:     .maxstack 2
  5:     .locals init (
  6:         [0] class C c)
  7:     L_0000: nop 
  8:     L_0001: newobj instance void C::.ctor()
  9:     L_0006: stloc.0 
 10:     L_0007: ldloc.0 
 11:     L_0008: ldflda valuetype S C::S
 12:     L_000d: ldc.i4.1 
 13:     L_000e: stfld int32 S::X
 14:     L_0013: ret 
 15: }

The key here is the ldflda instruction – MSDN says it “finds the address of a field in the object whose reference is currently on the evaluation stack”. In contrast, here’s how property access is compiled.

  1: .method public hidebysig static void Main() cil managed
  2: {
  3:     .entrypoint
  4:     .maxstack 1
  5:     .locals init (
  6:         [0] class C c,
  7:         [1] int32 val)
  8:     L_0000: nop 
  9:     L_0001: newobj instance void C::.ctor()
 10:     L_0006: stloc.0 
 11:     L_0007: ldloc.0 
 12:     L_0008: callvirt instance valuetype S C::get_S()
 13:     L_000d: ldfld int32 S::X
 14:     L_0012: stloc.1 
 15:     L_0013: ret 
 16: }
C::get_S() obviously returns a copy of S, and that’s the difference – ldflda loads the address of the instance instead. 

To summarize, for a struct declared in a class, the compiler disallows mutating it if its exposed through an instance property, but allows it if it is a non-readonly field.

How does the compiler detect mutation though? What happens if I call a method on the struct, rather than change a field inside it? More about it in the next post.

Posted by Senthil | with no comments

Take a look at the following short snippet of code.

  1: using System;
  2: 
  3: struct S
  4: {
  5:     public int X;
  6: }
  7: 
  8: class C
  9: {
 10:     /* More code here */
 11: }
 12: 
 13: class Test
 14: {
 15:     public static void Main()
 16:     {
 17:         C c = new C();
 18:         c.S.X = 1;
 19:     }
 20: }

Without knowing the type definition of C, can you tell whether the code will compile, much less work?

Turns out that you can’t. It depends on whether the S in c.S is defined as a field or a property.

1.
class C
{
    public S S;
}
2.
class C
{
    public S S { get; private set; }
}

The first type definition will compile, the second won’t (error CS1612: Cannot modify the return value of 'C.S' because it is not a variable)

Can you guess why?

Let’s assume the compiler does allow the second type definition to compile. Would you expect the value of X in the instance of S inside C to change? That is, what would be the output of

public static void Main()
{
    C c = new C();
    c.S.X = 1;
    Console.WriteLine(c.S.X);
}

If you’re expecting it to be 1, then you have just broken the value semantics of a struct, S might as well have been defined as a class. The above code is logically identical to

public static void Main()
{
    C c = new C();
    S temp = c.S;
    temp.X = 1;
    Console.WriteLine(c.S.X);
}

Because S is a struct, the property getter always returns a copy (temp), and changing the copy will have no effect on the original instance. With c.S.X = 1, you can’t access the copy either, so the only effect of executing that statement will be to make the poor developer’s eyes go wide as he steps through the code in the debugger, wondering why a simple field assignment refuses to work.

So, the C# compiler is being helpful here by not allowing this kind of code to compile.

Right, so why does it compile if S is defined as a field rather than a property? We’ll see why in the next post. Meanwhile, feel free to post why you think it works.

Posted by Senthil | 1 comment(s)
Filed under: , ,

Ever since I happened to stumble upon this book on Data Mining, I've been hooked. So much so that I've been brushing up on statistics and probability distributions just to follow along the book.

I'm currently reading the chapter on classification using decision trees, and what appeals to me is the how the technique reduces huge sets of data (records) into simple models that can both describe the data and can predict the class of previously unseen records. If all that goes right above your head, here's a simple example.

P1 P2 Class (Result)
True True True
False False False
True False False
False True False

Given the above tabular values, can the computer figure out that it is the truth table for P1 ^ P2 (with ^ representing logical conjunction, not XOR) ? If it can, it has

1. A model that describes the data, i.e., the data is the conjunction of the two boolean variables.

2. An evaluator that can calculate the result, given P1 and P2, i.e., predict the class. Since all combination of values are already available, there are no unseen records in this case.

A decision tree for the above table looks like this visually :-

image

with green edges and red edges representing true and false values for the attribute represented by the node, respectively. As you can see, the non-leaf nodes each represent an attribute, with one edge for each possible value of that attribute. The leaf nodes represent the result (class). Since we are dealing with binary attributes, the decision tree turns out to be a binary tree. Note that you can find out the result of { P1 = True, P2 = True } by simply navigating the tree and reading the value of the leaf node ( green edge from P1 to P2, green edge from P2 to T, and then read the value, T = true).

This was so interesting that I actually wrote a generic decision tree builder in C# (download). It uses Hunt's Algorithm to build the decision tree and uses Entropy as the measure to select attributes. At the moment, it can work only on records with nominal attributes (attributes like enums whose values are limited to a finite set). After training, it returns a decision tree, which can then be exported to XML or can be used to predict classes for new records.

Enough of the theory, let's see it in action. Here's how code that uses my tree builder looks.

class TruthTableEntry : Record<bool>
{
    public bool P1 { get; set; }
    public bool P2 { get; set; }

    public override bool Equals(object obj)
    {
        var other = obj as TruthTableEntry;
        if (other == null)
            return false;

        return other.P1 == this.P1 && other.P2 == this.P2;
    }

    public override int GetHashCode()
    {
        return P1.GetHashCode() ^ P2.GetHashCode();
    }
}
 
static void Main(string[] args)
{
   TruthTableEntry[] table = 
   {
       new TruthTableEntry() { P1 = true, P2 = true, Class = true },
       new TruthTableEntry() { P1 = true, P2 = false, Class = false},
       new TruthTableEntry() { P1 = false, P2 = true, Class = false},
       new TruthTableEntry() { P1 = false, P2 = false, Class = false }
   };

   var tree = new TreeBuilder<TruthTableEntry, bool>().Train(table);
   string predicate = FormPredicate(tree.RootNode, "");
   Console.WriteLine(predicate);
}

The TruthTableEntry class represents a row in the truth table above, and an array of those records are given to the TreeBuilder instance's Train method. The decision tree returned by it is then translated into a predicate by the FormPredicate method. Here's how the tree looks in XML if you call Save on the tree.

<?xml version="1.0" encoding="utf-8"?>
<Node ParentConditionValue="" SplitAttribute="P1">
  <Node ParentConditionValue="True" SplitAttribute="P2">
    <Node ParentConditionValue="True" Class="True" />
    <Node ParentConditionValue="False" Class="False" />
  </Node>
  <Node ParentConditionValue="False" Class="False" />
</Node>

which exactly matches what we expected.

image 

 

The FormPredicate method attempts to transform the tree into a predicate clause, as expressed mathematically in predicate logic. For the above tree, it returns

(P1 ^ P2) v (~P1 ^ False)

which when reduced through the laws of predicate logic

(P1 ^ P2) v (~P1 ^ False)

= (P1 ^ P2) v False (since False ^ P == False)

= (P1 ^ P2)         (since False v P == P)

And there you have it, the computer figured it out correctly. Yowza :)

If you aren't familiar with implementing the Dispose/Finalize pattern to clean up resources, stop and read this first. It is really important to get the implementation of the pattern right, or you'll run into a performance problem at best, or risk leaking and running out of resources at worst.

Assuming that you've implemented the pattern correctly for your types, how do you guarantee that Dispose is called on all finalizable types in your application? Sure, you have finalizers, but depending on them for cleanup is bad because

1. There is a definite cost to finalization. An object for which a finalizer runs requires an extra garbage collection cycle to get GC'ed.

2. Your finalizers may not run at all. If your application chews up resources in a way that doesn't trigger a garbage collection, you might run of resources and crash long before the GC (and the finalizer) decides to run.

Finding undisposed objects implementing finalizers becomes important then. Windbg with SOS has the !finalizequeue command that can show you objects that are ready for finalization. If you have only a few distinct types in that list that are created and used in very few places, !finalizequeue is all you need - just look for missing Dispose calls in those places, add them and you're done.

Oftentimes it isn't that simple though - a certain type could be used throughout your code base and hunting down missing Dispose calls on that type will quickly turn into an exercise in frustration. Wouldn't it be great if a tool could show you the list of finalized objects, along with the something that tells you where they were created and used?

Undisposed (http://undisposed.codeplex.com/) does exactly that. It uses the CLR profiling API to watch for finalizations and object allocations, and matches them to show you constructor stack traces for finalized objects. All you have to do is launch your application from Undisposed and then run your application through a set of scenarios exercising various code paths. Once your application is closed, Undisposed's Log Viewer (screenshot below) will open up, showing you the number of undisposed objects per type, with each type grouped by constructor stack trace.

log viewer screenshot

If you do download the software, remember to register the Undisposed COM dll using

regsvr32 Undisposed.dll

The software is still in Alpha, so please treat it like any pre-release software. Based on my very limited testing, it works reasonably well when you give it a limited set of types - otherwise, it tends to slow down the host application a little too much.

I'm looking for feedback and ideas, so feel free to comment.

PS : The inspiration for Undisposed came from this forum post on CodeProject - that's when I realized I could automate the whole thing :)

Posted by Senthil | with no comments

I'll bet a hundred bucks that any entry level C++ interview or exam will somehow drift into questions about the pre and post increment operators. It's almost become a canonical, rite of passage sort of thing.

Now using the operators is one thing, overloading them for your own types is another. In C++, you write something like

class X {
    int val;
public:

    X() : val(0){ }

    X operator++() { val++; return *this; }
    X operator++(int) { X pX = *this; val++; return pX; };

    int Value() { return val; }
};

Fairly straightforward stuff - C++ uses the int overload to distinguish between pre and post, and the post increment overload copies itself before incrementing and returns the copy.

Now let's see how to do this in C#.

class X
{
    public int Value { get; set; }

    public static X operator ++(X p)
    {
        int x = p.Value;
        return new X() { Value = x + 1 };
    }
}

No, there's no separate overload for the other one - the same method works for both pre and post increment operations. The compiler does the work of generating code that exhibits correct pre and post increment behavior. For code like

X x = new X();
X y = x++;

it generates

X x = new X();
X y = x;
x = X.op_Increment(x);

and for code like

X x = new X();
X y = ++x;

it generates

X x = new X();
X y = x = X.op_Increment(x);

Nifty, eh?

But wait, did you notice the difference between the C++ and C# overload implementation? The C# overload does not modify the original at all and always returns a copy, whereas the C++ code always modifies the current instance and returns a copy only for the post increment overload. Looking at the generated code, it should be easy to understand why - the compiler can't do its magic if the overload tinkers with the original instance.

Did you notice that X is a reference type?

X x = newX();
Xy = x;

x++;

Given that x and y are referring to the same object after executing the second line, shouldn't the increment be visible from y as well? It doesn't happen though, because the overload returns a new instance that then gets assigned to x, leaving y referring to the "old" unmodified instance.

All this confusion will not arise if X is a value type. The assignment of x to y will create a copy, so there's no possibility of changes in x reflecting in y. And modifying the passed instance from within the operator overload will have no effect on the original instance either.

Moral of the story : watch out when overloading operators on reference types.

Posted by Senthil | with no comments

If you use Skype, do you know that you can program against it? Head over to developer.skype.com if you're interested. There's a COM API, one for Java and even one for Python.

Just to show how easy it is, we'll write a bot in .NET that will simply echo whatever is sent to it.

You first need to download Skype4COM, a COM library provided by Skype developers to control Skype.

Create a WinForms application in your favorite version of Visual Studio (>= 2005), and in the default Form class, paste the following piece of code

public partial class MainForm : Form
{
    Skype skype;
    Dictionary<string, Session> userSessions = new Dictionary<string, Session>();

    public MainForm()
    {
        InitializeComponent();
    }

    private void Form1_Load(object sender, EventArgs e)
    {
        skype = new SkypeClass();
        skype.Attach(5, false);
        skype.MessageStatus += new _ISkypeEvents_MessageStatusEventHandler(skype_MessageStatus);
    }

    void skype_MessageStatus(ChatMessage pMessage, TChatMessageStatus Status)
    {
        if (Status == TChatMessageStatus.cmsReceived)
        {
            string text = pMessage.Body;
            string response = "Echoing " + text;
            skype.SendMessage(pMessage.FromHandle, response);
        }
    }

    private void Form1_FormClosing(object sender, FormClosingEventArgs e)
    {
        skype.MessageStatus -= new _ISkypeEvents_MessageStatusEventHandler(skype_MessageStatus);
    }
}

Add a reference to Skype4COM and build the code. Make sure Skype is running before launching this application. You'll need to accept accessing Skype from this application by hitting the "Accept" button when Skype asks- the Skype developers put it in there to avoid malicious code taking control.

That's all there is to it. If you now send a message to the currently signed in user, you should get back the same message text prefixed by "Echoing".

There are a lot of other things you could do with the  library, but the documentation is pretty thin, so you'll have to experiment a bit to get things working.

Still, it's pretty neat and I can see plenty of situations where a bot running on Skype would be useful. Imagine chatting with your bot running on your home PC, asking it the list of running applications. Skype takes care of NAT, firewall and other network related issues for you. How cool would it be to ask your build server bot the version number of the previous week's build?

Posted by Senthil | 2 comment(s)
Filed under: , ,

There are times when you feel really proud of yourselves; on top of the world, with no one in sight. And then there are times when you can't believe you did what you just did. Here's one of the latter :-

My task was to show a progress bar in our application. Nothing fancy there - just division of the operations to be performed into equal size chunks and Incrementing after completion of each task. Very straightforward indeed.

	int increment;

private void Form_Load(object sender, EventArgs e)
{
int discreteOperationsCount = GetDiscreteOperationsCount();
increment = (int)Math.Ceil((double)progressBar.Maximum / discreteOperationsCount);
}

private void OperationCompleted()
{
progressBar.Increment(increment);
}

It worked great when I tested it with 5, 10, 20 operations. I was even pleased that I cast the numerator to double straightaway, as I was typing code.

Only until I found that the progress bar progresses too fast if the number of operations is more than 100. If the result of the division is less than 1, Math.Ceil pushes it up to 1, which means that the progress bar will be full when 100 (the default Maximum value) operations complete, regardless of the actual number of operations.

Doh.

Ok, I thought, so let's store increment as a double and call Increment with that number. It would have worked, but for the fact that Increment does not have any overloads - it only accepts an integer. Hmm.. what gives?

After a little bit of thought, I figured out a way - scale everything by the degree of precision needed. With a scaling factor of 10, the calculation now becomes

increment = Math.Ceil((double)progressBar.Maximum / discreteOperationsCount * 10);

If discreteOperationsCount is 520, for example, then increment would become (100 * 10) / 520 * 10 = 19. We still have an integer, but it is more precise than the one calculated earlier.

Of course, you could increase the scaling factor further, and the number of significant digits will also increase exponentially. But the progress bar will have to do some approximation when mapping progress to pixels on the screen, so beyond a point, the increase in precision doesn't pay off that much.

Like I said at the top of this post, I couldn't believe I missed testing with the number of operations greater than the progress bar's Maximum. Truly one of those 'doh' moments.

There have been a lot of blog posts about why calling Thread.Abort from one thread to abort another is a bad idea. If you're still wondering if it actually is that bad, you'll be convinced by the end of this blog post :).

The other day, I was investigating frequent hangs in our application on the production machine. The only clue was that the last logged exception was a ThreadAbortException. What made it interesting was that in some cases, the application continued to run fine after logging the exception.  As I continued to look at the log data, I realized that every time the application hung, the ThreadAbortException's stack trace had a particular third party library's method at the top of the stack. The conclusion was that the application hanged whenever the thread was aborted when executing the third party library's code.

Reflector and half an hour later, the reason for hanging became clear. The third party code looked like this :-

void DoX()
{
Monitor.Enter(lockObj);
//Do some interesting stuff
Monitor.Exit(lockObj);
}

Now do you see why?  If the ThreadAbortException gets thrown after acquiring the lock but before releasing it, the lock gets orphaned and no other thread will be able to acquire it. Ever. The third party library is used heavily by our app, so any thread that calls DoX after the abort will hang forever.

OK, so let's set it right then.

void DoX()
{
lock(lockObj)
{
//Do some interesting stuff
}
}

The compiler will emit a try/finally block and will call Monitor.Exit from the finally block. That should solve the problem, right?

Maybe.

The CLR guarantees that it won't throw ThreadAbortExceptions when running finally blocks (and CERs and constructors), so once the finally block starts executing, we're safe. But what about if there is (JIT compiler generated) code that is run after the try block but before the finally block?

Even assuming there is no such code, things can still fail.

void DoY()
{
using (StreamReader sr = new StreamReader("Test.txt"))
{}
}

If a ThreadAbortException is thrown after the StreamReader constructor runs, but before the constructed instance gets assigned to the local variable (sr), then sr won't be disposed.

The bottom line is that it is really hard to write code that works properly in the face of ThreadAbortExceptions. And even if you somehow manage to write it, you can never be sure that all your external libraries are written to cope with it.

Moral of the story : Listen to all the good advice and don't call Thread.Abort from another thread :)

Posted by Senthil | 1 comment(s)

C++ itself is a pretty complex language, and C++/CLI, with its own baggage of things like handles and managed references doesn't make it easy to read or debug code at a glance. Here's a piece of code that had me head-scratching for a while.

enum class Options { Yes, No, Maybe};
void Func(Options ^g)
{
   Console::WriteLine(g == Options ::Yes);
}
void Func()
{
    Func(Options::Yes);
}

If you run the above piece of code, it will print False. Pretty weird huh?

It seems inexplicable, until you notice the caret sign (^) before g in the formal parameter declaration of Func. The caret symbol causes the enum to be boxed, and the == operator then does a reference comparison. Which will obviously fail, because a valid reference handle will never be zero. To verify that it is indeed comparing addresses, try changing the code to read  g->Equals(Options::Yes) - you'll find that it prints True. Of course, taking away the caret sign will work as well, and that is what the programmer intended in this particular case - the caret was a typo.

I'm surprised that the compiler doesn't offer much of a help here, not even a warning. Especially the part where it does a reference comparison between a handle and an int. The equivalent C# code

((object)g) == Options.Yes

fails with a compiler error.

With great power comes great responsibility, I guess :)

Posted by Senthil | 1 comment(s)

Implementing IEnumerable<T> can turn out to be tricky in certain cases. Consider the following code snippet

namespace Test
{
class Program
{
static void Main(string[] args)
{
Consume(new List<string>() { "a", "b", "c" });
}

static void Consume<T>(IEnumerable<T> stream)
{
T t1 = stream.First();
T t2 = stream.First();

Console.WriteLine(t1.Equals(t2));
}
}
}

 

As you'd expect, it prints true. Each First() call results in a call to stream.GetEnumerator(), and each such enumerator returns elements from the beginning of the list, so calling First() twice returns the same (first) element. All good so far.

Here's a tiny class.

        class StringGenerator
{
int index;

public string GetNext()
{
return (index++).ToString();
}
}

As you can see, it generates whole numbers as strings. Not very convenient to use though, wouldn't it be great if we can wrap it and make it enumerable?

        static IEnumerable<string> ConvertToEnumerable(StringGenerator g)
{
string item = null;

while ((item = g.GetNext()) != null)
yield return item;
}

ConvertToEnumerable simply loops over the list of items generated and makes use of yield return to make it enumerable.

Great, now what does

        Consume(ConvertToEnumerable(new StringGenerator()));

print?

It prints false.

False? FALSE? Can you figure out the reason?

If you've read Raymond's posts on implementation of iterators, you should have figured it out by now. The crux of the problem is that all enumerators returned share the same instance of StringGenerator.  Calling First() twice results in two calls to GetNext() on the same StringGenerator instance, and the values returned will obviously be different. To verify that, try creating the StringGenerator instance inside the ConvertToEnumerable function - it will print true now.

This bit me when I wrote code that parsed stuff out of an IEnumerable<string> instance. The actual program read text from a file, so I had a ConvertToEnumerable routine just like the one above, except that it took TextReader as the parameter. The Consume method passed the constructed IEnumerable<T> instance to various methods (say Method1 and Method2), with the assumption that whatever Method1 read off the stream won't be read again by Method2.

As we saw just now, this works if Consume is passed an IEnumerable<T> constructed like the StringGenerator case. It fails badly if a List<string> is passed instead. Because I wanted the "read elements off the stream" behavior, I called GetEnumerator() once for the passed IEnumerable<T> and then changed the methods called by Consume to take that IEnumerator<T> instead of IEnumerable<T>. That made the code work correctly for both cases.

Moral : Make sure you understand the implications when yield returning items off a shared item source.

Posted by Senthil | with no comments

Where Am I, a Windows Mobile app that I was working on, is now available for download. You can get the binaries from http://www.codeplex.com/wami. There is still a lot of fit and finish work to be done, but with the core functionality working fine, I decided to let it out in the wild and get some feedback.

There is RouteLogger.exe, which runs on the mobile phone and records cell broadcast information to a log file, along with the time interval between successive broadcasts. RouteEditor.exe, which runs on the PC, allows you to group the logged location information into names of places that you can recognize readily. RouteEditor saves the information into route files, which can then be loaded by wami.exe, running on the mobile phone. Wami gets the current cell broadcast location, indexes into the route information, and estimates the time needed to reach the final destination and intermediate points along the way.

Oh, and Wami is open source, so you can take a peek at the source code if you're interested. This being my first Windows Mobile app, your comments about the application, its usability and the source code are most welcome.

More Posts Next page »