WCF Discovery announcement service leaks memory - by BotHead

Status : 

  Not Reproducible<br /><br />
		The product team could not reproduce this item with the description and steps provided.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.

Sign in
to vote
ID 779167 Comments
Status Closed Workarounds
Type Bug Repros 0
Opened 2/13/2013 12:42:59 PM
Access Restriction Public


Over the past few days our team has enabled WCF Service Discovery using ad-hoc discovery into our existing WCF Services.  After some performance profiling we noticed that memory of the service hosting applications appeared to be increasing over time without any service activity on the primary services (i.e. none of our clients were connected or making calls into our services).  After digging around I noticed that when there is discovery traffic the number of dead threads in the servers also appears to increase over time.  I have been having trouble tracing back via WinDbg from a DeadThread ThreadObj (i.e. clr::Thread), but I have done a dumpheap of the existing System.Threading.Thread classes and found many of them to be AsyncPinned in association with System.ServiceModel.Discovery.AnnouncementSendsAsyncResult objects.

So in WinDbg when you run !t you will see something like this:
0:000> !t
ThreadCount:      12
UnstartedThread:  0
BackgroundThread: 5
PendingThread:    0
DeadThread:       6
Hosted Runtime:   no
                                           PreEmptive                                                   Lock
       ID  OSID        ThreadOBJ     State GC       GC Alloc Context                  Domain           Count APT Exception
   0    1  4668 0000000000768880      6020 Enabled  000000016000a598:000000016000b798 00000000007403a0     0 STA
  10    2  43dc 0000000000797460      b220 Enabled  0000000000000000:0000000000000000 00000000007403a0     0 MTA (Finalizer)
XXXX    3       0000000005a22480   8019820 Enabled  0000000000000000:0000000000000000 00000000007403a0     0 MTA (Threadpool Completion Port)
XXXX    4       0000000005a23460   8019820 Enabled  0000000000000000:0000000000000000 00000000007403a0     0 MTA (Threadpool Completion Port)
XXXX    5       0000000005a06a00   8019820 Enabled  0000000000000000:0000000000000000 00000000007403a0     0 Ukn (Threadpool Completion Port)
XXXX    6       0000000005a079e0   8019820 Enabled  0000000000000000:0000000000000000 00000000007403a0     0 MTA (Threadpool Completion Port)
XXXX    7       0000000005a14580   8019820 Enabled  0000000000000000:0000000000000000 00000000007403a0     0 Ukn (Threadpool Completion Port)
XXXX    8       0000000005a16810   8019820 Enabled  0000000000000000:0000000000000000 00000000007403a0     0 MTA (Threadpool Completion Port)
  17    9  45ec 0000000005a7db60   8009220 Enabled  0000000160013360:0000000160014fd0 00000000007403a0     0 MTA (Threadpool Completion Port)
  18    a  2010 0000000005a4ee00   8009220 Enabled  00000001001a6a80:00000001001a6fd0 00000000007403a0     0 MTA (Threadpool Completion Port)
  19    b  42a4 0000000005aa2820   8009220 Enabled  00000000c01c1b68:00000000c01c2fe8 00000000007403a0     0 MTA (Threadpool Completion Port)
  20    c  2418 0000000005a5d0f0   8009220 Enabled  0000000140459d80:000000014045afd0 00000000007403a0     0 MTA (Threadpool Completion Port)

And if you dig into a few of the managed System.Threading.Threads you will see this sort of gcroot path:

I am reporting this before I have pinned down the exact cause of the issue since like I said I am having trouble using WinDbg to trace back some of the dead threads, but it seems clear that the issue is related to WCF Discovery.

I have labeled this a blocking issue because if WCF Discovery ad-hoc services create a memory issue it is obvciously something we will have to turn off in our product until the issue has been addressed.

The general issue is that when using ad-hoc discovery our WCF services memory grows over time without ever stabilizing, and we suspect that this could be a dead thread leak or a leak of some other resource caused by WCF Discovery causing memory to grow.

Another note is that we are aware that memory will grow initially due to WCF Discoverys internal duplicate message cache that is used to cache UDP messages to certain limit, and we have seen that after those caches have been filled memory still continues to grow over a period of time (several days).
Sign in to post a comment.
Posted by Piyush [MSFT] on 3/11/2013 at 1:40 PM
Thanks for the update. Looks like you were able to find the root cause of the issue. We are resolving the bug as "No Repro". Let us know if you think there is any action item on us. Thanks.
Posted by BotHead on 3/11/2013 at 6:50 AM
After further testing we have discovered that the slow gradual memory growth appears to be caused by a System.Timers.Timer that is running in one of our service components. The slow growth seems to be due to the fact that the timer interval long (1 minute) and that we are manually enabling the timer after each interval to get the desired result rather than allowing the timer to auto reset. We re-tested discovery without this timer and its memory appears to stabilize as expected when our timer is not running and we see no further slow memory growth. The reason we were unable to track this down when using .NET profiler tools (i.e. ANTs) is that the when re-enabling the timer manually the only changes that seem to occur are to the native timer (i.e. no new System.Threading.Timer). I have attached sample code for the timer memory growth we have seen as well (we are not reporting a problem with that growth as that we are assuming is expected and that it eventually levels off and is collected).
Posted by BotHead on 2/27/2013 at 7:54 AM
I've attached a dump and some additional winDbg generated statistics. We are also performing some longer idle testing without discovery enabled to serve as a baseline of comparison to the results with discovery that have shown memory growth. So far with discovery disabled memory stabilizes after about ~30 hours of idle time, however the test has only been running for a couple days so we will let it run a bit longer.
Posted by BotHead on 2/26/2013 at 9:33 AM
Yes, I will attach a dump file and some additional WinDbg information we have collected from our tests. I am having some difficulty getting the files attached to this issue due to their size, but will keep trying. If there is an additional file submission method I can try that as well.
Posted by Piyush [MSFT] on 2/25/2013 at 2:26 PM
Thanks. We are continuing to investigate this. Do you have a dump file you can share with us?
Posted by BotHead on 2/19/2013 at 6:57 AM
Actually when I look at the memory a little closer it does show a clear increase still, there is not the level off that I thought I saw. I must have been zoomed out or looking at the wrong counter. I have attached the perfmon view of private bytes that shows the increase over the past 6 days.
Posted by BotHead on 2/18/2013 at 4:41 PM
Hi, thanks for your support on this! From the memory statistic (assuming private bytes?) I am wondering if you profiled the client process and not the service process, or both and are just referencing the client? The service process's starting private bytes is ~100K, where the client process is ~54K. The memory pattern I have observed has been on the service side (the "WcfDiscoveryDeadThreadLeak.exe" process), not the client side (the "WcfDiscoveryDeadThreadLeakClient.exe" process), which I may have neglected to point out directly. Incidently I have had test running, going on 5 days now, and it appears that memory levels out at about the ~2.5-3 day mark and maintains very well after that.

Please confirm that this is consistent with what you have observed with the service process as well, and if that is the expected behavior we may close this issue. I have attached another vmmap screencap of the 5 day memory view as well for your reference (note that this is using our production application, not the repro, so the size in memory will much larger, but the pattern should be the same as for the repro.
Posted by Piyush [MSFT] on 2/18/2013 at 3:10 PM
Thanks for your report. We ran the repro in our lab but we didn't see the memory leak behavior you described. The process slowly but steadily worked its way up to 65K and then stayed there for over 3 days. Is there anything else in your app?
Posted by BotHead on 2/14/2013 at 6:49 AM
I have updated this bug report to generalize the issue as a memory leak, and that dead thread leaks have been suspected as teh culprit, however it may be the case that some other internal issue with WCF Discovery is causing the memory leak.
Posted by BotHead on 2/14/2013 at 6:20 AM
I also have a winDbg dump saved when this issue occurred in our product. If the sample and information I have provided are not sufficient please let me know if providing that will help as well.
Posted by BotHead on 2/14/2013 at 6:19 AM
I also have a winDbg dump saved off where the problem occurred in our product. If the information and sample I have provided are insufficient I can also provide that upon request.
Posted by Microsoft on 2/13/2013 at 8:08 PM
Thanks for your feedback.

We are rerouting this issue to the appropriate group within the Visual Studio Product Team for triage and resolution. These specialized experts will follow-up with your issue.
Posted by Microsoft on 2/13/2013 at 1:03 PM
Thank you for your feedback, we are currently reviewing the issue you have submitted. If this issue is urgent, please contact support directly(http://support.microsoft.com)