[MERGE] Simple fix for "bzr log file"

John Arbash Meinel john at arbash-meinel.com
Fri Sep 19 02:47:07 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> There are better fixes possible, but this is my first go at cleaning up the
> memory consumption for "bzr log file".
> 
> Basically, we create the per-file graph, and the whole-tree graph. And then
> for each node in the whole-tree, we track what per-file ancestry we have.
> 
> That way we can determine which nodes have merged a given file change. We can
> ultimately do it in a better way, but this change is incrementally better than
> what we have.
> 
> Basically, the old code would *always* create a new set() whenever you have a
> revision with more than 1 parent. The new code only creates a new set() if the
> parents are actually different for that file.
> 
> This drops memory consumption for the mysql file from
> large-enough-that-it-swaps-and-I-have-to-kill it, to only 400MB.
> There is a very simple change, which can drop the memory consumption down to
> about 277MB, by caching tuples instead of frozensets. But it means we have to
> recompute the sets later on.

Attached is the same patch, only with a small update to the GraphIndex logic.
Now when it calls _buffer_all it doesn't automatically compute the
"_nodes_by_key" logic. This is only necessary if using "iter_entries_prefix",
but we are moving away from that functionality anyway (the code here asks for
(file_id, revision_id) tuples.)

With this patch, I shave off 42-35 = 7s total time, and it saves 410 - 387 =
23MB of memory.

We were spending 27s in _buffer_all. Actually, getting rid of the
'nodes_by_keys' *might* explain some ofwhat Gary was finding (where asking for
all keys is slower than asking for just one). But I wouldn't guarantee that.

There is also a small amount of time saved by using a list comprehension
instead of a generator. Perhaps something like 1.5s or so. I'm not confident
on that, because I don't think lsprof is accurate about it. (it claims ~4s
versus 2.6s.)

With this patch, _buffer_all is 99% of the time spent in
"get_parent_map(text_keys)", and of that 67% is spent in _parse_lines, 23% is
spent in _resolve_references, and 10% in _buffer_all itself.

(Relative to the total 'bzr log file' time, 62% of that time is spent in
various forms of get_parent_map, and 55% is spent in _buffer_all.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFI0wSbJdeBCYSNAAMRAp0GAKDNneXa/Ebh0Q3Fyj0dRfox9akMHQCgjSnF
kqSyCcmCwMPlsXcBGnR1PXw=
=22jK
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lighter_file_log2.patch
Type: text/x-diff
Size: 23067 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080918/5113fd9a/attachment-0001.bin 


More information about the bazaar mailing list