Finding and mitigating memory leaks in managed code using ANTS

A week or so ago, I got a call from a client who had some issues with an application that I had built. The application is a WPF 4 “media player” that is supposed to runs 24/7. The issue they were facing was that it only ran for about 12 hours before crashing. And at startup it used less than 100Mb of memory, and at the end (before crashing) it used about 1500Mb. So it clearly had a memory leak. And I needed to find it…fast!

And since this is the first time I have had to do this, I decided to share my experiences. Hopefully it will help someone…

The application is supposed to play a bunch of adds 24/7. The adds, or slots as I have decided to call them as they are a bit more flexible than regular adds, are currently pictures slideshows or videos or a combination. The application is built in a modular fashion to make it possible to add new types of slots in the future.

The slots are grouped and ordered into playlists, which are downloaded from Azure. And since this post is about memory leaks, I wont spend too much more time explaining the application. I just want to mention that it loads the controls that handle displaying the “media slots” on the fly based on what is in the playlist. And the dynamically loaded controls are just discarded when done…So they should be garbage collected…

Anyhow…back to the memory leaks…cause I obviously had one…or more…

So what to do when faced with something you haven’t really dealt with? And to be honest, I haven’t dealt a whole lot with memory leaks before as I work in .NET, and the garbage collector hides most of that for us…

Well, I knew there was a memory profiling tool around, and after a talking to a colleague it appeared to be a tool called “CLR Profiler” I was thinking of. So I downloaded that, only to discover that it only works with .NET 2… So I that was a dead end…

Twitter to the rescue! I sent out a question on Twitter and got a couple of replies within an hour. One of the most common replies was ANTS. So I googled ANTS, and found a tool from Red Gate called “ANTS Memory Profile”. So I downloaded a trial version and booted it up.

Considering that Red Gate is responsible for .NET Reflector since a while back, I had quite high hopes…

Anyhow…I started my new “toy” and to be honest I have been around long enough to actually be past the “I don’t read manuals” approach. I even considered googling some “Getting Started” stuff for ANTS. But I didn’t. Which was kind of cool as the first thing you are met by when you launch ANTS is this

ANTS-Memory-Profiler-Screenshots-Homescreen

As you can see, the UI is VERY simple. Not a whole lot in there really. Except for a user friendly screen that tells you where to go from there. So I followed the recommendation and started a new project. As you start a new project, you are met by this screen

ANTS-Memory-Profiler-Screenshots-New-Project

As you can see, it is fairly simple. Just choose what EXE file (if it is a Windows app you are profiling) you want to profile, where to execute it and a few things like that.If you want to, you can probably go in a make a range of tweaks, but I didn’t… But I must say that I am quite curios about putting it on top of one of my Silverlight apps… :)

After having selected my application and pressed start, the application work for a little while spinning up the application to profile. As soon as that is done, the profiled application goes about it’s own business without any concern for the fact that there is a little perverted application watching it.

ANTS on the other hand, probably does a whole lot of things. Not that you see it. Cause all you see is another simple screen (shown below). The only thing in that screen that tells you that something is happening, is a graph of the currently used memory. This really isn’t much more than the Task Manager does. Even though I assume that this data is more accurate than what you get from the Task Manager, which apparently can be a bit off at times…

ANTS-Memory-Profiler-Screenshot-Step-1

But the funky stuff starts when you follow the instructions and push the “Take Memory Snapshot” button. This will tell ANTS to take a snapshot of the current memory…ehh…yeah…sort of self explaining…

What happens when it does that, is that you get a copy of the memory as it was at that current point. You can then walk through this copy and look at what objects were in memory, what objects those objects had references too, as well as what objects they are referenced by.

Taking a snapshot seems to purge the memory a bit. I assume it either forces the GC to run before it walks through the memory, or it automatically runs for some reason. But I did see the memory graph go down a little bit every time I took a snapshot. But that doesn’t really matter, so ignore that rant… Unless you can explain why… :)

The first screen that you see after getting a snapshot looks like this

ANTS-Memory-Profiler-Screenshots-Step-2

 

 

 

 

As you can see, you get an overview of what objects are in memory. You get to see what type of objects use up the most memory, as well as which ones has the highest instance count. You also get the option to filter the data using the panel on the left hand side…

I won’t cover the filtering, but it makes it easy to find specific problems…

But the really cool thing is that a couple of buttons appear at the top. The “Summary” and “Class List” buttons are the only ones that are enabled at the moment, but that is cool…the others will be enabled when they are needed…

But before I went on to look at those, I needed to get another snapshot to be able to compare object graphs. Why? Well…a memory leak is basically objects that are kept in memory without ever being removed. Normally the garbage collector would do this for us, but in some cases they are kept in memory and thus create a leak. This is easy to simulate by keeping static references to objects, as objects referenced statically will never be garbage collected.

So by having 2 snapshots, I can compare the instance count and memory usage between the two snapshots. By doing this, I can see what type of object is not being released, and then try to see why they aren’t released.

More about this later…but first another snapshot…

ANTS-Memory-Profiler-Screenshots-Step-3

As you can see, I now get 2 columns of information. In this case, the left one contains stats from the first snapshot, and the right one contains stats from the second one. If you take more than 2 snapshots, you can select which snapshots to compare…

While debugging my app, I took about 40-50 snapshots over 6 hours. Or rather…after I had found the leak I did. Cause to be honest, I only needed 3 snapshots with a couple of slots being played between each to see where my problem was.

I will get back to my specific problem later, but let’s look at some more features in the ANTS application first…

Clicking the “Class List” button gives us the following view

ANTS-Memory-Profiler-Screenshots-Step-4

Or one very similar at least. It sort of depends on the app you are profiling of course... In this case, it happens to show data from my application, and I have also selected the “Group by namespace” option. This option makes it easy to figure out what classes are from what namespaces. And since I was pretty sure it was in my own code, I could focus on those namespaces…

Expanding a namespace shows a list of instances of types from that namespace. And here, you also get a comparison between the two snapshots. So all I needed to do to find my leak, was to compare these statistics for my types and see if there where any specific objects whose instance count just kept going up over time.

In a perfect world, the GC would come around and collect all unused objects as needed and therefore keep the instance count relatively steady. But if they are for some reason never released, the instance count would just keep growing as the GC would not see them as collectable…and in my case the instance count just kept rising…

And the cool thing is that, when you have identified what class type is not released, you can select it in the list and click the “Class Reference Explorer” or “Instance List” button to get even more information…

The “Instance List” button gives us a view that looks like this

ANTS-Memory-Profiler-Screenshots-Step-5

As you can see, it shows a list of each of the instances of that type. It also gives you information about size and how far away it is from the GC Root, if it is a new object and so on.

But my personal favorite is the “Class Reference Explorer”, as it is graphical representation that quite clearly indicates why your object is still in memory…

ANTS-Memory-Profiler-Screenshots-Step-6

It basically places your selected type in the middle. Any type referencing it to the left. And any type being referenced by it to the right. So the types on the left are keeping it in memory, and the types on the right are kept in memory because of this type.

It also shows a little bar that indicates how big a percentage of your instances of the selected type is referenced by a type or is referencing another type.

In the image above, all of my PlaylistSlots are referenced by a list of IPlaylistSlots. While only 10% (one of my 10 instances) are referenced by a view model called MainWIndowViewModel. All of the PlaylistSlot in turn reference a Dictionary and a string.

And by looking at that, I can figure out that my PlaylistSlots will not be able to be garbage collected until the IPplaylistSlot array releases them. Even if my MainWindowViewModel releases them…

But in this case I actually expect there to be 10 instances…

If I on the other hand started to pile up loads of instances that weren’t expected to be there, it is easy for me to identify which class is referencing them and there for keeping them in memory. And I can then look at the code and try to identify a reason for them not being released.

And if you see instances of types being kept in memory when they shouldn’t be, and you want to know more. You can select one of the instances in the “Instance List” and click on the “Object Retention Graph” button. This brings up the following view…

ANTS-Memory-Profiler-Screenshots-Step-7

This view shows you exactly what types are keeping references to your instance. And it even has a big red bubble telling you where to start looking.

So you start at the bottom, which is where you find your instance. You can then traverse up the object graph and see every single object that has a reference to it, and the objects that have a reference to that and so on until you reach the top level object.

So…now that I have this awesome tool to use to look at my memory, what did I find? Well…it wasn’t that surprising as such. Most managed code will handle garbage collection nicely. Unfortunately, my application interopted with un-managed code in a couple of its controls. Especially in the controls showing videos, as these controls used a Windows Media Player ActiveX control through an interop assembly.

And why would I do that? Well…I feel a need to defend myself… The application runs on low performance Atom based machines. These machines also host a couple of other apps at the same time, which causes them to have some performance issues. And apparently, the MediaElement in WPF is less “performant” (I know…it isn’t a real word) than the WMP ActiveX. So by using interop and WMP, I got the machine to play 720p video without stuttering and locking up the other parts of the system…

But using interop means that you have to be a little careful. Especially when adding eventhandlers apparently. You need to remember to detach them when you are done with the ActiveX control. And to be honest, we should be trying to remember to do so even with our managed code. But it doesn’t normally cause problems.

In my case, I kept crating new controls on the fly. These controls in turn created WMP objects and added event handlers to them. And when I discarded the WPF controls, the WMP object was holding it all in memory, slowly leaking enough memory to kill my application…

This is apparently a fairly common scenario considering the amount of blog posts about it. So take a look at this before doing too much debugging if you leak memory. ANTS even has a filter that only shows you objects that are only kept in memory due to event handlers. (Maybe I should have tried that… :) )

So… What can we do when we have located the leak? Well…implementing a good resource dispose mechanism is a good way to start… And what is a good way of doing this? Well…any class that uses external resources should be implementing IDisposable. I think we all know that.

But what if the class isn’t being disposed properly? Well…a destructor will handle that. This can however become quite complex if you start to introduce inheritance and so on.

A long time ago, I saw a Microsoft pattern for handling this, and I have used that in most cases where I needed to handle disposing resources on my own. And I think it handles it quite nicely. Especially since it offers the ability to get nice resource management even in an inheritance chain, which otherwise can become somewhat complicated as mentioned before…

A lot of you have probably already seen this pattern, but it is worth mentioning again…

The first step is to implement the IDisposable interface

public class DisposableClass : IDisposable
{
public void Dispose()
{
// Release any managed resources
// Release any unmanaged resources

}
}

And then we need a destructor for the non managed resources

public class DisposableClass : IDisposable
{
public void Dispose()
{
// Release any managed resources
}

~DisposableClass ()
{
// Release any unmanaged resources
}
}

Why do we need both of these? Well, in the dispose method, we can dispose any other disposable objects that we have reference to and make sure that we release those resources. But if that isn’t called, the destructor will at least make sure that we release any unmanaged resources that the garbage collector can’t handle.

The problem is that managed object that we reference are also likely to be garbage collected at the time that the current objects destructor is called. And since the GC doesn’t guarantee that resources will be collected in any particular order, any objects that we reference in our object, might already have been destructed and/or collected. So in the destructor, we aren’t “allowed” to touch any managed objects. We can set our references to null, but we should never try calling dispose or any other methods on them as they might be “gone”… And if you start calling methods on objects that have already been destroyed, you are going to cause exceptions to be thrown, which will wreak havoc in the GC… So don’t!

The funky thing is that both of these methods do sort of the same thing but in different situations. So we can combine them to this

public class DisposableClass : IDisposable
{
void Dispose(bool disposing)
{
if (disposing)
{
// Release any managed resources
// Call dispose etc
}
// Release any non-managed resources
// Set references to null
}

public void Dispose()
{
Dispose(true);
}

~DisposableClass ()
{
Dispose(false);
}
}

Now all code is in one place keeping it DRY (Don’t Repeat Yourself).

There are two things left though. First of all, you must remember to keep track of you disposed state. After an object is disposed, a user should never be allowed to work with it again. So remember to store that information and check it every time something is called… It is tedious I know…but it makes the developer experience better as it makes sure that you don’t work with disposed objects that might be in a transient state. Cause a disposed object is still in memory, but it has potentially released necessary resources…

The second thing to remember is that the current implementation causes “issues” when inheriting this class. But if we make the Dispose(bool) protected and virtual, inheriting classes can easily hook into the disposing functionality by overriding it and disposing its resources. Just remember to call the base class’ implementation as well!

public class DisposableClass : IDisposable
{
public void DoSomething()
{
VerifyIsDisposed();
// Implementation
}
protected void VerifyIsDisposed()
{
if (IsDisposed)
throw new ObjectDisposedException();
}

protected virtual void Dispose(bool disposing)
{
if (disposing)
{
IsDisposed = true;
// Release any managed resources
// Call dispose etc
}
// Release any non-managed resources
// Set references to null
}
public void Dispose()
{
Dispose(true);
}
~DisposableClass()
{
Dispose(false);
}

private bool IsDisposed { get; set; }
}

I think that is it…

I hope it did give you some information that you needed, or at least verified what you thought. But if it didn’t, feel free to ask me or even tell me that I am wrong. But if you are going to tell me that I am wrong, please explain why I am wrong so I can improve… :)

Cheers!

Comments (2) -

Aneesh Kamble 1/31/2011 12:52:12 PM

Hi ZeroKoll,
That was really a nice and useful article which is helping me a lot.
    I have similar problem. When i take the screenshot, around 35 MB of unmanaged resources are not getting released.

How to find out which unmanaged resources are causing the problem.

I had used flash(.swf) files and cached them.
I thought caching was causing the problem, but when i removed the cache, still the same problem.

Please, could you tell me how to detect unmanaged resources in ANTS profiler.


Thanks

Hi Aneesh!
As far as I know, the ANTS profiler will only look at the managed memory. However, if you are getting memory leaks from a managed app that don't show up in ANTS, I think the problem lies somewhere outside of your application. In my case, the unmanaged resources where kept in memory due to managed resources not being cleared properly.
My only suggestion is to look through your use of the unmanaged resources and make sure that you release them.
And also, if you leak increases over time, I suggest checking with ANTS several times and try to find any type whose instances keeps growing.
It is also possible to tell ANTS to take a snapshot from code. So if you are looping through something or so, and need to have a snapshot after each loop, that is possible...
Hope this helps!
Cheers!

Pingbacks and trackbacks (1)+

Comments are closed