Tuesday, 20 July 2010

Object memory use in Python

There's a number of libraries and tips for working out how much memory a Python object is using. Some solutions seem to try and analyse what data structures are used within the object, and calculate based on that. Others attempt rough approximation, like just going by the size of the pickled representation of an object.

This is how I used to be able to do it:

def GetMemoryUsage(ob):
s = cPickle.dumps(ob)
memUsed = sys.getpymemalloced()
ob2 = cPickle.loads(s)
return sys.getpymemalloced() - memUsed
Of course, there is no getpymemalloced method in the sys module that comes with a Python installation. This was a custom method provided by a patched version of Python that did the required calculations based on the internal memory allocation structures.

There isn't even an API function that can be abused to do this, with this custom function absent. Well, there is one, but it generates a block of text which includes this information for debug builds.

6 comments:

  1. There's the recently introduced sys.getsizeof().

    ReplyDelete
  2. This seems flawed- I'm guessing it provides a summing of the heaps in use. Problem here is that things overallocate (especially for alloc pools), meaning that most of the time you may get the right value, but if you trigger an expansion of the pool this reports back the heap expansion rather than the individual alloc.

    I'm curious... there a reason you're not using the __sizeof__/sys.getsizeof(obj) machinery in >=2.6?

    ReplyDelete
  3. We currently use Python 2.5, one a modified version with the function I mentioned, and the other unmodified.

    ReplyDelete
  4. ferringb: I believe it sums the used parts of the heaps in use, not just the allocated heaps however much is used within them.

    jcalderone, ferringb, marius: Let's be clear that getsizeof() is not really anywhere near an equivalent solution, it requires the user to pass every involved object through it. Have a list? Fine. Add an entry to a list? The return value for that list is the same. In order to work out how much memory the list is using, you would need to iterate over every element within it, and every element within each element.. etc.

    ReplyDelete
  5. > Have a list? Fine. Add an entry to a list? The return value for that list is the same.

    That's because, in most sane language with variable-size lists, lists don't grow by one element each time. They have a buffer, and they grow the container by a bunch of slots when they fill up on append (extend or literal lists are different, they're seem to size the list exactly in Python).

    On 2.6, as I append items to an empty list the size bumps happen at the 5th, 9th and 17th (of the first 20).

    Though you are indeed correct that getsizeof is not recursive, at least on builtin lists: a litteral one-element list is always 80 bytes, whether you put in an integer (size 24) or a 1000-char string (size 1040).

    And FWIW, you could probably getsizeof your pickled string. Pickle is nowhere near exact either, I don't believe it to be a raw memory dump.

    ReplyDelete