[RFC] ancestry cache
Aaron Bentley
aaron.bentley at utoronto.ca
Fri Feb 17 03:05:33 GMT 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John A Meinel wrote:
| Aaron Bentley wrote:
|>I've been fiddling with progress bars for merge application, and it
|>turns out we spend a lot of our time selecting a merge base. What
|>surprised me is that the largest part of this is the time generating a
|>graph of common ancestors.
|>
|>So I've implemented a prototype ancestry cache. Patch attached. Does
|>this seem worth pursuing?
| I realize this has the advantage that it handles ghost revisions. But is
| this genuinely faster than reading the Weave prefix of inventory.weave.
| It actually seems like the weave prefix should be faster because you
| have fewer bytes to read.
I haven't benched against that. I didn't want to change the behavior of
revision_graph. In this case, I'm fairly sure that direct descendants
of ghosts would be converted into descendants of the null revision,
which would be very bad. revision_graph also has callers that are
looking ghosts.
In any case, the cache is not a bottleneck.
| Also, in my store-escaping branch, I was thinking to allow '/' as a
| valid character in revision ids (then we wouldn't have to escape Arch
| fully qualified patch names), it is easy enough to escape them to write
| them onto disk. But I don't really care.
Using '/' is a hack. I originally wanted to use rio format, but I had
trouble using it. For one thing, it doesn't accept arbitrary bytes for
keys, so I couldn't use straight revision-ids as keys. For another, I
wasn't getting the same data out that I put in.
This won't be a merge request until the format is tidied up, if it
becomes a merge request at all.
Also, it doesn't handle non-Repository revision_sources, and that will
need to be fixed.
| + try:
| + parents = a_cache.get_parent_ids(line)
| + if len(parents) == 0:
| + parents = [NULL_REVISION]
| + except bzrlib.errors.NoSuchRevision:
| + if line == revision:
| + raise
| + parents = None
|
|
| It seems like this makes a_cache actually important, since it doesn't
| look anywhere else for the parents.
|
| Otherwise it would seem that you would use:
|
| try:
| parents = a_cache.get_parent_ids(line)
| except NoSuchRevision:
| rev = repository.get_revision(line)
| parents = rev.parent_ids
You're right; it's basically memoization of
repository.get_revision().parent_ids. It's only cache in the sense that
~ operations that destroy revisions might want to destroy it.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFD9T190F+nu1YWqI0RAnnKAJ0elYhle+0GqgqaLjcKDdtZbfH9sACeMLHD
WNk7fNsMUaAdvvqgV8X7+DM=
=ErFA
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list