Smart revision fetching update

John Arbash Meinel john at arbash-meinel.com
Thu Aug 9 15:15:25 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Hi all,
> 
> Thanks to a dose of the flu I didn't get support for transferring revision data
> efficiently over the smart server protocol ready in time for 0.19 (0.90).  The
> good news is it will have most of the 0.91 cycle to get settled and nicely
> polished.
> 
> Of the code in my http://people.ubuntu.com/~andrew/bzr/repo-refactor branch,
> about 2/3rds has been extracted into independent branches and reviewed already,
> so I'll land that as soon as 0.90 opens.  The remaining 1/3rd basically adds a
> Repository.fetch_revisions smart method and teaches the client side to use it,
> and I'll send that to the list within a day.
> 
> The work to be done from there:
> 

...

>   * Measure, and probably fix, memory consumption.  I expect at the moment my
>     code is buffering the entire request/response bodies in memory, which is Not
>     Good.  It would be nice to have something similar to the benchmark test
>     suite that measures the memory high-water mark of various operations.  Or
>     perhaps just set a limit with or ulimit/setrusage, and then feed in a data
>     set larger than the limit: if it trips the limit, the memory consumption
>     needs fixing.  A quick and dirty hack would be to limit requests to e.g.
>     100 revisions at a time, but I think we can fix this properly.
> 
> And, of course, test this as much as possible in real usage!  :)


I can almost guarantee that it does. Specifically in:
 bzrlib.smart.protocol.?.read_body_bytes (line 356)

on the first call to 'read_body_bytes()' you buffer the entire body into
a StringIO, which you then use to return the requested chunks back to
the user.

I'm attaching a fairly simple script which I used when doing memory
profiling in the past. It is a bit crufty, but the basic idea is that
you run "watch_mem.py bzr command foo".

As long as you are running on a Linux which supports /proc/PID/status,
it will poll that descriptor every 0.1 seconds, and then report
statistics out.
You can customize what it reports a little bit by hacking the code (it
isn't very configurable).

It defaults to logging to ~/mem.log. Writing a CSV file with things like
VmPeak, VmSize, VmRSS, etc.

The big trick is just that it spawns the child subprocess, so it knows
what PID to track. And then it has a bit of code to turn the output of
/proc/PID/status into a dictionary that it can use for logging.

Also, it should be noticed that it doesn't interpret the output strings
at all. I believe status generally reports bytes in kB, but these will
be written to the log as '5000 kB'.

Anyway, I used that to monitor memory consumption over time for the
bottom graph of:
http://bazaar-vcs.org/Performance/0.9

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGuyF9JdeBCYSNAAMRArpkAJ0SlYohpDrtLqrDenTHgVtJz8DcoACgyzU2
33j6ba9WWesve3Aq/ftilXc=
=QO9n
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: watch_mem.py
Type: text/x-python
Size: 1543 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070809/bf92ee7b/attachment.py 


More information about the bazaar mailing list