1.0alpha bundles performance

Aaron Bentley aaron.bentley at utoronto.ca
Sat Jun 16 17:30:44 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've done some further work on bundles-- I've been able to extract the
SequenceMatcher.get_matching_blocks output from knit line-deltas*, so
that I don't have to redo those comparisons.  That means I only use a
SequenceMatcher once per 4 deltas I generate, which drops generation
time from 6 seconds to 3 seconds.

At this point, it would be really nice to have the graph API, because
other costs have been reduced, so finding a common ancestor takes 23% of
runtime.  A stopgap measure would be to disable topological sorting on
get_ancestry when we don't need it.

Diff generation is also a big cost, but since I'm doing sequence
matching only on..

1. versions that have no fulltext
2. multi-parent versions against their non-leftmost parent

...we don't really have a choice.  These diffs are what make 1.0a
bundles so small.  Similarly, reading knits takes time.  Probably too
much, but speeding up knits is really beyond the scope of this project.

Once piece of good news: it does scale.  I was able to generate a bundle
of the full history of bzr in 4 minutes.  It's a mere 8.9 M.  The
callgraph has no big surprises-- it's dominated by sequence matching,
but that can't be avoided.

I should also mention that there is no observable overhead to using the
pack format for writing.

Aaron

* Technically, the input is the line-delta and the number of lines in
the target file.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGdA/F0F+nu1YWqI0RAh7nAKCFD0TvZpLIkZAD39jLDq0uQKy44QCZAbml
AWt3u2OIuwGc5RxBtV1RFZg=
=GXq1
-----END PGP SIGNATURE-----



More information about the bazaar mailing list