produce memory overhead in bazaar
Lukas Diekmann
Lukas.Diekmann at uni-duesseldorf.de
Fri Dec 17 10:52:42 GMT 2010
Hi,
I finally managed to do some interesting tests using bazaar as
benchmark. And this one might be of interest for you, too. To start my
analysis, I added a hook directly after pack() as you suggested:
+++ b/bazaar/bzrlib/repofmt/groupcompress_repo.py Fri Dec 03 11:28:52
2010 +0100
@@ -806,6 +806,9 @@
if packer.new_pack is not None:
packer.new_pack.abort()
raise
+ # HOOK IT BEFORE MEMORY IS CLEANED UP
+ import __main__
+ __main__.b.hook("pack")
if result is None:
return
for pack in packs:
The most interesting is the string analysis. It iterates through all
strings and iterates those that are in the memory multiple times. This
is the Top100 using Bazaar to checkout the project Exaile:
(String, # of duplicates)
('1964893 399452 0 0', 8427),
('TREE_ROOT', 4198),
('exaile.py-20070730111123-cq63edh75d6mjeva-5', 429),
('main.py-20080510050432-go6cebudunvsrkl3-4', 291),
('playlist.py-20080510050432-go6cebudunvsrkl3-6', 247),
('xlpanels.py-20070730111123-cq63edh75d6mjeva-90', 223),
('playlist.py-20080504232136-bmpyf7bmbvzvbuen-1', 222),
('xlxlmisc.py-20070730111123-cq63edh75d6mjeva-97', 214),
('info at noctus.net-20100503094641-dwzxd23nexuw58ub', 200),
('track.py-20080508104638-tddrkspfa0rsc8u8-1', 197),
('exaile.glade-20070730111123-cq63edh75d6mjeva-7', 187),
('main.py-20080507013411-26y88jms0b60ye65-1', 186),
('collection.py-20080621013840-g6sabh1asaapcdun-1', 186),
('__init__.py-20080510050432-go6cebudunvsrkl3-3', 169),
('7286370 341644 0 0', 167),
('reacocard at gmail.com-20090103204839-u7oazgc65dz4m31r', 160),
('xltrackslist.py-20070730111123-cq63edh75d6mjeva-96', 154),
('collection.py-20080504071010-t4xlw4seg0cp94pk-1', 154),
('trackdb.py-20080507125817-a6le3fzshmo63sx1-1', 149),
('reacocard at gmail.com-20091027023020-8yj35u2ly25rxgbr', 144),
('guiutil.py-20080622023948-mzbi0gv18qyp5ahq-1', 138),
('xltracks.py-20070730111123-cq63edh75d6mjeva-95', 130),
('reacocard at noslor-20080605044456-8f3ar8pogu3tgu4g', 128),
('planning-20080305220822-131lki708z6r8s2i-1', 119),
('synic at liandrin-20080607184730-bwfd9jt35f95nfji', 118),
('player.py-20080430224829-bpokkcfnq4mzplw4-2', 110),
('bzrlib', 105),
('xlplayer.py-20070730111123-cq63edh75d6mjeva-186', 100),
('_fields', 99),
('__module__', 98),
('_ast', 98),
('cover.py-20080701183209-ipla0796uoobyigr-1', 97),
('xlprefs.py-20070730111123-cq63edh75d6mjeva-91', 95),
('reacocard at gmail.com-20090825213545-1jmzj2yn4ulwukmg', 93),
('xlguimain.py-20070730111123-cq63edh75d6mjeva-302', 91),
('makefile-20080510050432-go6cebudunvsrkl3-2', 90),
('info at noctus.net-20091114220525-6h27i4mwhfzntqj4', 89),
('vcs-imports at canonical.com-20060908011324-ukdvjxyti4wkqzzk', 89),
('common.py-20080425213358-52ovn5q0g2mvnqku-2', 88),
('files.py-20080707205416-7k1w79e3rj1gl7ow-2', 87),
('info at noctus.net-20090906110654-xx9hfq4dawn65e92', 87),
('playlists.py-20080625203047-buda1p984ltp01rb-1', 84),
('\n', 83),
('info at noctus.net-20090908160310-dz4bco9seqh1n68u', 83),
('main.glade-20080510050442-8w34v8fkrke45r4h-4', 80),
('reacocard at gmail.com-20090523042220-r5xlyw0z87yoxj8j', 80),
('info at noctus.net-20100406082113-609u65n9imoeqfaq', 78),
('tray.py-20080719172235-fwn006yx8brqtpaa-1', 75),
('cover.py-20080610172721-lg8oleldtdksdj35-1', 75),
('xldbus.py-20080619232920-ijviqbuvjhjov4i7-1', 74),
('mmwidgets.py-20090719022656-1cd82gy61g8ldx48-6', 72),
('event.py-20080505025633-cxf1rmnq0gu1eozo-1', 71),
('makefile-20070730111123-cq63edh75d6mjeva-3', 68),
('reacocard at gmail.com-20090824173750-1kelf127a5vn4vhv', 66),
('10386871 435944 0 0', 65),
('xlmedia__init__.py-20070730111123-cq63edh75d6mjeva-195', 65),
('__init__.py-20090719022656-1cd82gy61g8ldx48-3', 64),
('reacocard at gmail.com-20100603040032-7vm8pnb9nqc6h5j0', 63),
('settings.py-20080509024721-a51k983ozeea5yd7-1', 61),
('radio.py-20080621041236-95iuv1voihwi66jw-1', 61),
('menu.py-20080709231641-oc73utdfw6f7ehxx-1', 61),
('reacocard at gmail.com-20090715233013-7nii4l5uj4gb8tgb', 61),
('xlmedia.py-20070730111123-cq63edh75d6mjeva-89', 60),
('codehost at crowberry-20090923094129-x2s4tq8jfim1sfuy', 59),
('xlguiplaylist.py-20070730111123-cq63edh75d6mjeva-299', 59),
('codehost at crowberry-20090922094559-mu49udturjw7w8ia', 59),
('reacocard at gmail.com-20090824194837-is664airpn9ix8g0', 59),
('scrobbler.py-20080606202457-9rbtmjxivhtyd4m2-3', 59),
('reacocard at gmail.com-20090921152045-c7p3zpvku5ofjcjy', 59),
('de.po-20090824171328-0pld09prohqrnczw-11', 58),
('reacocard at gmail.com-20090606230001-jqr8f6b13s330uzs', 57),
('xldbusinterface.py-20070730111123-cq63edh75d6mjeva-88', 56),
('reacocard at gmail.com-20091210050607-pjpy3p8g6c12zhsz', 53),
('_base.py-20080728021953-vrajrkl29v9944st-4', 52),
('plugins.py-20080608202952-hap2q109gsirnwwg-3', 52),
('9181708 446797 0 0', 51),
('preferences.py-20081006235223-occfp2wtda9jr45m-1', 51),
('xllibrary.py-20070730111123-cq63edh75d6mjeva-300', 51),
('__init__.py-20100514163816-dnh261on7zuvo275-3', 51),
('properties.py-20090726165426-nfqdn2w8g2lu2bs7-1', 50),
('pluginsgui.py-20070730111123-cq63edh75d6mjeva-150', 50),
('xlcovers.py-20070730111123-cq63edh75d6mjeva-86', 46),
('__init__.py-20080728021953-vrajrkl29v9944st-2', 46),
('info at noctus.net-20100522083438-9f28pl9m3dk1nmzi', 46),
('info at noctus.net-20100615173914-eqa9ozv5eimmhoib', 46),
('9628505 399422 0 0', 45),
('messages.pot-20090824222709-mphqez9xqt90jjdp-1', 44),
('main.ui-20091114220257-ovw956ou3qvr5rl3-21', 43),
('formatter.py-20100331175756-408v7sywf63w5dql-1', 43),
('engine_normal.py-20090615224213-tlh7i9mrz998pler-4', 43),
('osd.py-20081029211838-s4nc1q6b3w4iicyh-1', 42),
('xldb.py-20070730111123-cq63edh75d6mjeva-87', 41),
('_base.py-20090615224213-tlh7i9mrz998pler-3', 41),
('3053419 601212 0 0', 41),
('xlguiqueue.py-20090411062915-dmz9apxcc2cq353f-1', 40),
('arolsen at gmail.com-20090726165849-gjd3fnxc499xpyb7', 40),
('__init__.py-20090726165845-r0p01nx1zmbgl1hb-3', 39),
('xltrack.py-20070730111123-cq63edh75d6mjeva-94', 38),
('3654631 361603 0 0', 37),
dgets.py-20081020172328-s6ye3jz6b754mzki-3', 37)
Maybe you could save some memory (in this case 15% of total) by using
intern()? I also did a total memory analysis. Here's an image of the
ContainerAnalysis of Bazaar (checking out Exaile):
hope image is supported by mailing list
I hope this will be of some use for you. :)
Regards,
Lukas
John Arbash Meinel schrieb:
> On 11/29/2010 3:00 AM, Martin Pool wrote:
> > On 26 November 2010 21:44, Lukas Diekmhisann
> > <Lukas.Diekmann at uni-duesseldorf.de> wrote:
> >> Hi there,
> >>
> >> my name is Lukas and i am working on optimizations for pypy. Pypy is a
> >> python interpreter written in python itself and aims to simplify,
> >> improve and speed-up python interpretation. For my master's thesis I am
> >> writing several benchmarks based on real programs that use a lot of
> >> memory. Thus I'm hoping to find out which datatypes produce the most
> >> overhead so I can extend the pypy interpreter and jit by implementing
> >> additional datatypes that are optimized for their purpose (for
> example a
> >> list containing integers only).
> >>
> >> I think that your program could be a candidate for benchmarking because
> >> it seems to eat a lot of memory. But since I am new to bazaar it is
> hard
> >> for me to tell and produce a situation where it uses the most
> memory. So
> >> I wanted to ask you, if you could tell me a scenario that is worth
> >> looking at and how I can reproduce it.
> >>
> >> If you want to know more about this project have a look at
> >> http://morepypy.blogspot.com/2010/08/call-for-benchmarks.html
> >> <http://morepypy.blogspot.com/2010/08/call-for-benchmarks.html>
> > Hi, Lukas, that sounds like a pretty good project, both for Pypy and
> Bazaar.
>
> > The short answer is that fetching a large repository like lp:emacs or
> > lp:launchpad and then running 'bzr pack' or committing a lot of
> > changes is a decent test.
>
> > John Arbash Meinel and others have been doing some memory optimization
> > work and you can find some descriptions by looking back through the
> > list archives.
>
> > Another option is to look for bugs tagged 'memory'.
>
> > And finally the bzr usertest plugin has some macrobenchmarks.
>
> > Stay in touch,
>
> The only problem with the test is that PyPy isn't compatible with our
> Pyrex/C extensions. And while the pure python code always works, it
> isn't 100% identical, and was certainly never tuned for memory or
> performance. (It was tuned for readability, so that you have a reference
> to crib from, etc.)
>
> So while it is certainly true that doing "pypy bzr branch lp:launchpad"
> is going to have very high memory consumption, it isn't necessarily
> because of pypy.
>
> John
> =:->
>
More information about the bazaar
mailing list