[RFC] ancestry cache

Fri Feb 17 03:05:33 GMT 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John A Meinel wrote:
| Aaron Bentley wrote:
|>I've been fiddling with progress bars for merge application, and it
|>turns out we spend a lot of our time selecting a merge base.  What
|>surprised me is that the largest part of this is the time generating a
|>graph of common ancestors.
|>
|>So I've implemented a prototype ancestry cache.  Patch attached.  Does
|>this seem worth pursuing?

| I realize this has the advantage that it handles ghost revisions. But is
| this genuinely faster than reading the Weave prefix of inventory.weave.
| It actually seems like the weave prefix should be faster because you
| have fewer bytes to read.

I haven't benched against that.  I didn't want to change the behavior of
revision_graph.  In this case, I'm fairly sure that direct descendants
of ghosts would be converted into descendants of the null revision,
which would be very bad.  revision_graph also has callers that are
looking ghosts.

In any case, the cache is not a bottleneck.

| Also, in my store-escaping branch, I was thinking to allow '/' as a
| valid character in revision ids (then we wouldn't have to escape Arch
| fully qualified patch names), it is easy enough to escape them to write
| them onto disk. But I don't really care.

Using '/' is a hack.  I originally wanted to use rio format, but I had
trouble using it.  For one thing, it doesn't accept arbitrary bytes for
keys, so I couldn't use straight revision-ids as keys.  For another, I
wasn't getting the same data out that I put in.

This won't be a merge request until the format is tidied up, if it
becomes a merge request at all.

Also, it doesn't handle non-Repository revision_sources, and that will
need to be fixed.

| +                    try:
| +                        parents = a_cache.get_parent_ids(line)
| +                        if len(parents) == 0:
| +                            parents = [NULL_REVISION]
| +                    except bzrlib.errors.NoSuchRevision:
| +                        if line == revision:
| +                            raise
| +                        parents = None
|
|
| It seems like this makes a_cache actually important, since it doesn't
| look anywhere else for the parents.
|
| Otherwise it would seem that you would use:
|
| try:
|   parents = a_cache.get_parent_ids(line)
| except NoSuchRevision:
|   rev = repository.get_revision(line)
|   parents = rev.parent_ids

You're right; it's basically memoization of
repository.get_revision().parent_ids.  It's only cache in the sense that
~ operations that destroy revisions might want to destroy it.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFD9T190F+nu1YWqI0RAnnKAJ0elYhle+0GqgqaLjcKDdtZbfH9sACeMLHD
WNk7fNsMUaAdvvqgV8X7+DM=
=ErFA
-----END PGP SIGNATURE-----