Sometime ago I had to troubleshoot a problem in Biztalk Server 2004. I was told that Biztalk Server 2004 apparently stopped working and the only way to get it back on track would be to restart the instances and we didn´t had errors reported. After that everything would run smoothly. The worst part was that this only happened from time to time.
Since this appeared to be a hang we collected three memory dumps with 2-3 minutes apart when the issue showed up again.
The first thing we looked at was threadpool.
0:136> !threadpool
CPU utilization 81%
Worker Thread: Total: 101 Running: 101 Idle: 1 MaxLimit: 250 MinLimit: 100
Work Request in Queue: 1148640
--------------------------------------
Number of Timers: 0
--------------------------------------
Completion Port Thread: Total: 0 Free: 0 MaxFree: 4 CurrentLimit: 0 MaxLimit: 250 MinLimit: 100
I see CPU utilization equals 81%. This usually means we are in middle of a GC. Looking at my threads I can see that thread 102 triggered the GC.
108 0x2ba8 0x116b8228 0x3800220 Enabled 0x00000000:0x00000000 0x000e6dc0 0 MTA (Threadpool Worker)
102 0x1bb8 0x116b8780 0x1800220 Disabled 0x05a77d84:0x05a79d58 0x000e6dc0 1 MTA (GC) (Threadpool Worker)
115 0x22ac 0x116b8948 0x1800220 Enabled 0x00000000:0x00000000 0x000e6dc0 1 MTA (Threadpool Worker)
Looking closer at this thread stack we can see a call to SysSuspendForGC. This means that we are going to suspend all threads to allow the GC to proceed.
0:102> kb
ChildEBP RetAddr Args to Child
120ef0dc 7c822114 77e67143 00000001 120ef12c ntdll!KiFastSystemCallRet [d:\srvrtm\base\ntos\rtl\i386\userdisp.asm @ 545]
120ef0e0 77e67143 00000001 120ef12c 00000001 ntdll!ZwWaitForMultipleObjects+0xc [d:\srvrtm\base\ntdll\daytona\obj\i386\usrstubs.asm @ 2363]
120ef188 77e6109d 00000001 120ef2c8 00000000 kernel32!WaitForMultipleObjectsEx+0x11a [d:\nt\base\win32\client\synch.c @ 1521]
120ef1a4 791feb58 00000001 120ef2c8 00000000 kernel32!WaitForMultipleObjects+0x18 [d:\nt\base\win32\client\synch.c @ 1386]
120ef3dc 791fec7c 00000001 00000001 116b8780 mscorsvr!Thread::SysSuspendForGC+0x248
120ef3f4 791fee14 00000001 00000000 ffffffff mscorsvr!GCHeap::SuspendEE+0xcf
120ef410 7924f4be 00000002 00000000 ffffffff mscorsvr!GCHeap::GarbageCollectGeneration+0x13f
120ef42c 792ef053 ffffffff 00000000 120ef498 mscorsvr!GCHeap::GarbageCollect+0x3c
120ef43c 0b961f3a 120ef448 ffffffff 00000000 mscorsvr!GCInterface::CollectGeneration+0xf
Suspending all threads is allowed if preemptive is true. Otherwise it won´t be possible to suspend the thread. Now looking at my threads there is actually one that has preemptive set to Disable. This is thread 76.
74 0x2fa8 0x0bd2c2e0 0x1800220 Enabled 0x05a9b104:0x05a9d0d8 0x000e6dc0 0 MTA (Threadpool Worker)
76 0x1958 0x0bd2c4a8 0x1800222 Disabled 0x05a95144:0x05a970d8 0x000e9a08 1 MTA (Threadpool Worker)
60 0x18f0 0x0bd2c670 0x3800220 Enabled 0x00000000:0x00000000 0x000e6dc0 0 MTA (Threadpool Worker)
Lets switch to thread 76 and see what it is doing.
0:076> kb
ChildEBP RetAddr Args to Child
0de6e8d4 7c822124 7c83970f 00000174 00000000 ntdll!KiFastSystemCallRet [d:\srvrtm\base\ntos\rtl\i386\userdisp.asm @ 545]
0de6e8d8 7c83970f 00000174 00000000 00000000 ntdll!NtWaitForSingleObject+0xc [d:\srvrtm\base\ntdll\daytona\obj\i386\usrstubs.asm @ 2371]
0de6e914 7c839620 00000000 00000004 00000001 ntdll!RtlpWaitOnCriticalSection+0x19c [d:\srvrtm\base\ntdll\resource.c @ 619]
0de6e934 7c837a15 7c889d94 00000000 0de6ea30 ntdll!RtlEnterCriticalSection+0xa8 [d:\srvrtm\base\ntdll\resource.c @ 870]
0de6e968 7c836c23 00000000 00000000 0de6e9b0 ntdll!LdrLockLoaderLock+0x133 [d:\srvrtm\base\ntdll\ldrapi.c @ 2746]
0de6e9e4 7c836cf9 00000001 00000001 00000000 ntdll!LdrGetDllHandleEx+0x94 [d:\srvrtm\base\ntdll\ldrapi.c @ 719]
0de6ea00 77e665e3 00000001 00000000 0de6ea80 ntdll!LdrGetDllHandle+0x18 [d:\srvrtm\base\ntdll\ldrapi.c @ 614]
0de6ea4c 77e92330 0de6ea80 00000000 0de6f160 kernel32!GetModuleHandleForUnicodeString+0x20 [d:\nt\base\win32\client\module.c @ 1278]
0de6eec4 77e663f4 00000001 00000002 792323a4 kernel32!BasepGetModuleHandleExW+0x14f [d:\nt\base\win32\client\module.c @ 1556]
0de6eedc 792322cd 792323a4 00000000 0de6f160 kernel32!GetModuleHandleW+0x29 [d:\nt\base\win32\client\module.c @ 1368]
0de6f130 793cd536 00000000 00000000 119c5740 mscorsvr!LoadImageHlp+0x1e
0de6f148 7932bf1e 00000000 0bd2c4a8 0de6f840 mscorsvr!AssemblyFileHash::GenerateDigest+0xb3
It´s waiting on a critical section. In ntdll!RtlEnterCriticalSection the first parameter is in fact the critical section we are interested on.
0:076> !critsec 0x7c889d94
CritSec ntdll!LdrpLoaderLock+0 at 7c889d94
WaiterWoken No
LockCount 13
RecursionCount 1
OwningThread 9b0
EntryCount 0
ContentionCount 47
*** Locked
The owning thread of this section is thread 9b0 that corresponds to thread 136. Now the funny part is that thread 136 is waiting for GC to complete. Now this looks like a deadlock. No wonder that Biztalk stopped working.