Incorrect bzrlib usage or a memory leak?

Abhay Mujumdar amujumdar at blackducksoftware.com
Mon Oct 3 11:50:13 UTC 2011


In order to get content every file for for every revision, I was shell'ing out 'bzr cat' with the right parameters. It is pretty slow you if you fork a process for each file revision, probably because Python runtime is started and bzr code is loaded for each invocation.

So I re-wrote it to use bzrlib API and it is order of magnitude faster. However, it is leaking 2-10MB per revision. I tweaked the code to print heap using heapy. The code and stats are below. Notice that along with other things, count of bzrlib._static_tuple_c.StaticTuple objects keep increasing.

I am suspecting I am not using the API correctly (may be the cmd_* classes are not supposed to be used in a loop?).

I'd appreciate any help or hints.

Thanks
Abhay

from bzrlib.builtins import cmd_cat
from bzrlib.builtins import cmd_log
from bzrlib.revisionspec import RevisionSpec
import StringIO
import os
from guppy import hpy
hp = hpy()

# Get contents of a file for specific revision.
def bzr_cat(repository_url, filename, revision):
  print "=cat file"
  print hp.heap()
  os.chdir(repository_url)
  spec = RevisionSpec.from_string(revision)
  output = StringIO.StringIO()
  cmd = cmd_cat()
  cmd.outf = output
  cmd.run(filename=filename, revision=[spec], name_from_revision=True)
  cmd.cleanup_now()
  val = output.getvalue()
  output.close()
  print hp.heap()
  return val

# Output of heapy when script started
Partition of a set of 111157 objects. Total size = 14541184 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
    0  50169  45  5017480  35   5017480  35 str
    1  30786  28  2947480  20   7964960  55 tuple
    2   8229   7   987480   7   8952440  62 function
    3   8449   8   946288   7   9898728  68 types.CodeType
    4    839   1   838768   6  10737496  74 dict of type
    5    290   0   773152   5  11510648  79 dict of module
    6    999   1   769392   5  12280040  84 dict (no owner)
    7    843   1   731648   5  13011688  89 type
    8    658   1   336416   2  13348104  92 dict of class
    9    209   0   217360   1  13565464  93 dict of bzrlib.option.Option

#Output of heapy after a few thousand calls
Partition of a set of 437390 objects. Total size = 334836816 bytes.
Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
    0 211711  48 302202312  90 302202312  90 bzrlib._static_tuple_c.StaticTuple
    1 160872  37 19002840   6 321205152  96 str
    2   1369   0  4491920   1 325697072  97 dict (no owner)
    3  31504   7  3026832   1 328723904  98 tuple
    4   8403   2  1008360   0 329732264  98 function
    5   8643   2   968016   0 330700280  99 types.CodeType
    6    866   0   863008   0 331563288  99 dict of type
    7    296   0   784000   0 332347288  99 dict of module
    8    871   0   755808   0 333103096  99 type
    9    662   0   337504   0 333440600 100 dict of class


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/bazaar/attachments/20111003/3738b521/attachment.html>


More information about the bazaar mailing list