The case of the leaking thread handles
This was one of the more challenging and interesting problems to debug in a while. When testing out code for an unrelated fix, I noticed that the application's handle count, as seen in the Handles column in Task Manager, kept rising steadily.
WinDbg, showed that the app was leaking thread handles.
0:003> !handle1417 Handles
Type Count
Event 452
Section 8
File 12
Port 2
Directory 3
Thread 916
Desktop 1
KeyedEvent 1
Half expecting the code to hold references to dead threads, I dumped Thread objects in the GC heap.
0:003> !dumpheap -stat -type System.Threading.Threadtotal 353 objects
Statistics:
MT Count TotalSize Class Name
790fe284 2 144 System.Threading.ThreadAbortException
79124b74 32 640 System.Threading.ThreadHelper
791249e8 33 1056 System.Threading.ThreadStart
790fe704 286 16016 System.Threading.Thread
Total 353 objects
No luck there. This was perplexing - I knew the app created a lot of threads that do some work and die, but the Thread class is the only way the app deals with them, so if it's not Thread instances, who else was holding the handles open?
Realizing that barring a CLR bug, there must be some .NET object behind each handle, I dumped the entire GC heap, looking for types with the number of instances close to the number of handles.
0:003> !dumpheap -stat...
7912d9bc 25 54264 System.Collections.Hashtable+bucket[]
7b483294 1098 57096 System.Windows.Forms.CreateParams
7b48193c 1098 61488 System.Windows.Forms.Control+ControlNativeWindow
7912d8f8 544 72264 System.Object[]
7b48398c 819 85176 System.Windows.Forms.Application+MarshalingControl
7b4835ec 819 108108 System.Windows.Forms.Application+ThreadContext
0014c740 40 137064 Free
790fd8c4 7016 421636 System.String
Total 22730 objects
That number (819) of ThreadContext instances was pretty close to the number of open thread handles (916). Repeating the above exercise after letting the application run some more time confirmed the theory - the number of ThreadContext instances increased by almost the same extent as the number of open thread handles.
It should have been simple from now on - all that was left was to find the roots holding the ThreadContext instances and problem solved.
0:003> !dumpheap -mt 7b4835ec...
0145b550 7b4835ec 132
total 819 objects
Statistics:
MT Count TotalSize Class Name
7b4835ec 819 108108 System.Windows.Forms.Application+ThreadContext
Total 819 objects
and then
0:003> !gcroot 0145b550 Note: Roots found on stacks may be false positives. Run "!help gcroot" for
more info.
Scan Thread 0 OSTHread 34c0
Scan Thread 2 OSTHread 34c8
DOMAIN(001546E8):HANDLE(Pinned):9f13f0:Root:02373030(System.Object[])->
013720f4(System.Collections.Hashtable)->
0142486c(System.Collections.Hashtable+bucket[])
Here's where the fun started. The main root was (System.Object[]), which indicated that the next object in the object graph (Hashtable) was a static member of some class. A search for static Hashtables in the application's source code turned up nothing. The only possibility then was the BCL - some class there must be holding a static Hashtable of ThreadContexts.
I got a lucky break there - Reflector showed that ThreadContext class had a static Hashtable and that it's constructor adds itself (this), to the hashtable. But who instantiates ThreadContexts? Reflector's "Analyze" command turned up too many method flows to go through one by one, so I decided to do it the other way - set up a breakpoint on ThreadContext's constructor and let the application run.
0:003> DumpMT -MD 7b4835ec EEClass: 7b483400
Module: 7b454000
Name: System.Windows.Forms.Application+ThreadContext
mdToken: 020001e4 (C:\WINDOWS\assembly\GAC_MSIL\System.Windows.Forms\2.0.0.0__b77a5c561934e089\System.Windows.Forms.dll)
BaseSize: 0x84
ComponentSize: 0x0
Number of IFaces in IFaceMap: 1
Slots in VTable: 64
--------------------------------------
MethodDesc Table
Entry MethodDesc JIT Name
...
7b664ad4 7b4a76b0 PreJIT System.Windows.Forms.Application+ThreadContext..ctor()
and then
0:003> !bpmd -md 7b4a76b0
MethodDesc = 7b4a76b0
Setting breakpoint: bp 7B06B934 [System.Windows.Forms.Application+ThreadContext..ctor()]
When the breakpoint was hit, the stack looked like this
!CLRStack OS Thread Id: 0x2c90 (3)
ESP EIP
0106ea74 7b06b934 System.Windows.Forms.Application+ThreadContext..ctor()
0106ea78 7b06b8cc System.Windows.Forms.Application+ThreadContext.FromCurrent()
0106ea80 7b06b885 System.Windows.Forms.WindowsFormsSynchronizationContext..ctor()
0106ea90 7b06b7b2 System.Windows.Forms.WindowsFormsSynchronizationContext.InstallIfNeeded()
0106eabc 7b06a09b System.Windows.Forms.Control..ctor(Boolean)
0106eb30 7b068f75 System.Windows.Forms.Label..ctor()
0106eb3c 00d5012b LanguageFeatures.Program.<Main>b__0()
0106eb44 793b0d1f System.Threading.ThreadHelper.ThreadStart_Context(System.Object)
0106eb4c 79373ecd System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
0106eb64 793b0c68 System.Threading.ThreadHelper.ThreadStart()
0106ed8c 79e7c74b [GCFrame: 0106ed8c]
As the stack dump shows, the ThreadContext was created as part of System.Windows.Forms.Control's constructor to set the synchronization context of the thread to WindowsForms. The problem was that created context didn't die when the thread died. Some more poking around with Reflector showed that the instance is removed from static Hashtable when the thread receives a quit message (via Application.ExitThread, for example). The threads in the app were not pumping messages, so the ThreadContexts kept accumulating in the Hashtable, keeping the associated Thread handle open. Here's some code that demonstrates the problem.
class Program
{
static void Main(string[] args)
{
while (true)
{
new Thread(delegate() { new System.Windows.Forms.Label(); }).Start();
Thread.Sleep(100);
}
}
}
Bottomline - don't create controls from non-message pumping threads, especially if you create lots of threads over the lifetime of the application.
It seems obvious in retrospect, but in the application's case, it was not using the control visually, so it didn't really need WM_PAINT or the hundred other messages that a control needs to work. The fix was to move control creation to a GUI (message pumping) thread and then Invoke/BeginInvoke from the other threads.