Java developers chase a memory leak in python 

Photo by Luis Tosta on Unsplash

You take some things for granted when you work in a stack for a long time. For example, you assume that having the proper tools to do whatever it is you need to do is standard, nowadays. The truth is that sometimes you have to appreciate what you have, because you’ll miss it when you find out you don’t have it at your disposal. Like memory usage monitoring.

We’ll use a real-life example of how we troubleshot a memory issue in python, while our background and developers were in java. Our first reaction was to start monitoring the memory with a tool like Visual VM. Surprisingly, a similar tool to monitor memory usage doesn’t exist for python. That seemed a bit unbelievable but we cross-checked it with experienced python developers.

Source: https://visualvm.github.io/

So, with the help of our staff engineer, we searched for the right tool to locate the memory leak, whatever the root cause was.

We tried 3 different approaches and used a variety of tools to locate the issue.

Method #1: A simple decorator for execution time

Method #2: Using memory_profiler

Method #3: Calculating objects sizes

Method #1: A simple decorator for execution time

Decorator functions are used in python widely and they are easy, fast and convenient.

How it works: You can write a simple wrapper that calculates the execution time of a code block. Just take the time at the beginning and the end and calculate the difference. This will result in the duration of the execution in seconds.

Then, you can use the decorator in various functions that you think might be the bottleneck.

First, you create the function:

from functools import wraps
from time import time


def measure_time(f):
   @wraps(f)
   def wrap(*args, **kw):
       ts = time()
       result = f(*args, **kw)
       te = time()
       print(f.__name__ + " %s" % (te - ts))
       return result
   return wrap

Then, the usage is really simple (like a java annotation):

from foo import measure_time

@measure_time
def test(input):
   # some logic
   # a few lines more
   return output

Console (function name and execution time):

test 500.9122574329376

This would be the junior’s approach to debug memory issues, kind of like using print to debug. Nonetheless, it proved to be an effective and fast way to get an idea of what’s going on with the code.

Pros:

  • Easy to implement
  • Easy to understand

Cons:

  • Adds useless code that need to be deleted afterwards
  • If your app is big, you add too many measurements
  • Works better if you have a hunch about where the issue is located
  • It doesn’t actually measure memory
  • The method needs to be executed successfully in order to see result (if it crashes, you won’t have anything)

Method #2: Using memory_profiler

A more “memory-focused” approach is memory_profiler. It’s an app that can be installed via pip and used as a decorator.

This app will track down memory changes and create a report with measures for each line.

Installation:

pip install -U memory_profiler

Then, the usage is really simple (like java annotation):

@profile
def my_func():
   a = [1] * (10 ** 6)
   b = [2] * (2 * 10 ** 7)
   del b
   return a

Source: https://pypi.org/project/memory-profiler/

The profiler output will be something like this:

 Line #    Mem usage    Increment  Occurrences   Line Contents
============================================================
     3   38.816 MiB   38.816 MiB           1   @profile
     4                                         def my_func():
     5   46.492 MiB    7.676 MiB           1       a = [1] * (10 ** 6)
     6  199.117 MiB  152.625 MiB           1       b = [2] * (2 * 10 ** 7)
     7   46.629 MiB -152.488 MiB           1       del b
     8   46.629 MiB    0.000 MiB           1       return a

Source: https://pypi.org/project/memory-profiler/

A more realistic example:

from memory_profiler import profile

def get_c():
  return (10 ** 6)

def get_d():
  return (2 * 10 ** 7)

def calculate_a():
  return [1] * get_c()

def calculate_b():
  return [2] * get_d()

@profile
def test():
  a = calculate_a()
  b = calculate_b()
  output = a + b
  return output

The output:

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    38     20.2 MiB     20.2 MiB           1   @profile
    39                                         def test():
    40     27.7 MiB      7.5 MiB           1      a = calculate_a()
    41    180.4 MiB    152.6 MiB           1      b = calculate_b()
    42    340.7 MiB    160.4 MiB           1      output = a + b
    43    340.7 MiB      0.0 MiB           1      return output

As you can see, there is no info about methods get_c() and get_d(). That’s an issue when you have a big app with many methods calling each other; you have to add the decorator to each method you think might be causing the problem (just like method #1 for time measurement).

Keep in mind that this method, depending on your app size, may make your code significantly slower. In our case, when we needed to measure the parsing of a big file that was causing the memory leak, memory_profiler didn’t finish at all.

Pros:

  • Easy to implement
  • Understandable

Cons:

  • Adds useless code that need to be deleted afterwards
  • If your app is big, you add too many measurements
  • Works better if you have a hunch about where the issue is located
  • Depending on your app size, it can be very slow or unable to measure
  • If it crashes, you learn nothing about the issue

Method #3: Calculating objects sizes

If you are storing data into objects, a good approach is to check the objects’ sizes. That way, you can narrow down the suspicious code snippets that might be causing the problem.

One way of doing this is by using sys.getsizeof(obj), which returns the size of an object in bytes.

How to implement:

import sys

def test():
  a = calculate_a()
  print(f"Size of object a:tt{sys.getsizeof(a)}")
  b = calculate_b()
  print(f"Size of object b:tt{sys.getsizeof(b)}")
  output = a + b
  print(f"Size of object output:t{sys.getsizeof(output)}")
  return output

The output:

Size of object a: 8000056
Size of object b: 160000056
Size of object output: 168000056

While working on that approach, we bumped into an article about a similar case of troubleshooting. Check it out: Unexpected Size of Python Objects in Memory.

Based on that, since our objects were really complex with many children and a lot of collections and attributes, we decided to take a look at the actual size calculation. This could help us narrow down the culprit. 

import sys
import gc

def actualsize(input_obj):
   memory_size = 0
   ids = set()
   objects = [input_obj]
   while objects:
       new = []
       for obj in objects:
           if id(obj) not in ids:
               ids.add(id(obj))
               memory_size += sys.getsizeof(obj)
               new.append(obj)
       objects = gc.get_referents(*new)
   return memory_size

Source: https://towardsdatascience.com/the-strange-size-of-python-objects-in-memory-ce87bdfbb97f

Now we have to alter the code to get the actual size like this:

def test():
  a = calculate_a()
  print(f"Size of object a:tt{actualsize(a)}")
  b = calculate_b()
  print(f"Size of object b:tt{actualsize(b)}")
  output = a + b
  print(f"Size of object output:t{actualsize(output)}")
  return output

This is producing a slightly different output:

Size of object a: 8000084
Size of object b: 160000084
Size of object output: 168000112

As you can see, there is a difference in the output object (168000112 instead of 168000056, 56 bytes more). As I mentioned earlier, our object structure is a bit complicated, with multiple references to the same instances. As a result, the actual size calculation was way off and not that helpful. However, this method really does have potential.

Pros:

  • Easy to implement
  • Easy understand

Cons:

  • Adds useless code that need to be deleted afterwards
  • If your app is big, you add too many measurements
  • Works better if you have a hunch about where the issue is located

Conclusion

We spoke about the methods, but how did we solve the problem? Which one of these roads led us to the dreaded memory hog?

Let’s talk about what the actual problem was. There was a part called copy(obj) creating clones of data. When we’ve had low volume, it wasn’t an issue. The objects that were going to be reused were just few. But when we scaled up, we had an enormous amount of cloned objects and none of them used pointers, even though we had so many identical values.

Did any of these methods solve the problem? Not really. In our case, they didn’t provide enough data or indicators to locate the issue. The value of everything above was that by studying more and more we understood better how python manages memory, the objects, the pointers, etc. This gave us enough knowledge and insights to locate the issue. After all, we were the ones implementing it. There was a hunch of where the issue might be located, but we needed proof. We didn’t get any during the troubleshooting, but after fixing the issue, we used all methods for further performance improvements. They were really helpful in the optimization process.

The biggest mistake we made was that in the early stages of implementation, we didn’t run a benchmark with huge files to check if and how we can scale up. We did it after a couple of months of implementation. Obviously, it was a bit late to consider the design a failure and start over. We had already invested too much time to just scrap it. Thankfully, it wasn’t a design flaw, but a simple bug causing a memory leak. Rookie mistake.

Lesson learned: Even if you are really confident in your design and implementation, perform stress tests and check the scalability of the app as early as possible. It’s the best way to avoid redesign and refactoring.

Dimitris Nikolouzos

Engineering Manager | Music Standards @ ΟRFIUM

https://www.linkedin.com/in/dnikolouzos/

https://github.com/jnikolouzos

Contributors

Panagiotis Badredin, Backend Software Engineer @ ORFIUM

https://www.linkedin.com/in/panagiotis-badredin-01b0754b/

Antonis Markoulis, Staff Engineer @ ORFIUM

https://www.linkedin.com/in/anmarkoulis/

Alexis Monte-Santo, Software Engineer @ ORFIUM

https://www.linkedin.com/in/alexis-monte-santo-0277b8116/