[MERGE] RevisionSpec.as_revision_id()

Sat Mar 29 10:23:24 GMT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Just as an aside, I don't know if people realize I'm on vacation for the next 3
weeks. My net access is pretty poor, so if people could submit the patches that
they approve, I would appreciate it. It also means that I'm unlikely to respond
thoroughly to reviews, though this one I feel is rather important. (It does
stuff like change '-r ancestor:' from being 2s+ to being near instantaneous on
Emacs stuff.)

Aaron Bentley wrote:
| John Arbash Meinel wrote:
|> ~   I think it is both clearer, *and* we should probably try to deprecate
|> ~   .in_history() at some point. Our internals pretty much universally
|> work in
|> ~   terms of revision ids. And generally where they don't, they should
|> be updated
|> ~   so that they do.
|
| Sure.  I'd add that with log, it really helps to know that the start and
| end revisions are in the lefthand ancestry.
|
| Also, I think we will probably want to support getting trees from
| revision specs at some point-- the 'current:' tree is an obvious one,
| but also perhaps merge preview trees.

This could be interesting, but it gets pretty hard to do from our apis. Figuring
out context becomes pretty difficult. Because sometimes you'll want at
WorkingTree context, and others a Branch. Unless you want the context to be
"magic" we then need to have a bunch of parameters for when you do have a tree,
or not.

We could certainly have a "as_tree(context)" sort of function, which would
return a RevisionTree when appropriate. However, if we were to do that, I would
want lazy RevisionTrees again (since you often don't need anything but the
.revision_id, and extracting the whole inventory just to get .revision_id is
foolish.)

|
|> === modified file 'bzrlib/branch.py'
|> --- bzrlib/branch.py	2008-03-14 10:55:37 +0000
|> +++ bzrlib/branch.py	2008-03-20 17:28:10 +0000
|> @@ -1375,8 +1375,8 @@
|>          """See Branch.set_revision_history."""
|>          if 'evil' in debug.debug_flags:
|>              mutter_callsite(3, "set_revision_history scales with history.")
|> -        self._clear_cached_state()
|>          self._write_revision_history(rev_history)
|> +        self._clear_cached_state()
|
| ^^^ what's the purpose of moving this?  The previous position gave
| write_revision_history the opportunity to populate the cache.

This wasn't my code, I thought he had tested it and found it was important to do
after. I'll try to check it. However, you removed the next line which is:

- -        self._clear_cached_state()
~         self._write_revision_history(rev_history)
+        self._clear_cached_state()
~         self._cache_revision_history(rev_history)
~         for hook in Branch.hooks['set_rh']:
~             hook(self, rev_history)

So this function is clearing the cached state and setting the final values at
the end of it. While it is certainly possible that _write_revision_history()
would be doing it, it may be that _write* should just write out the data, and
this is the layer where it gets cached.

|
|>          self._cache_revision_history(rev_history)
|>          for hook in Branch.hooks['set_rh']:
|>              hook(self, rev_history)
|> @@ -1809,13 +1809,29 @@
|
|>  class BzrBranch6(BzrBranch5):
|
|> +    def __init__(self, *args, **kwargs):
|> +        super(BzrBranch6, self).__init__(*args, **kwargs)
|> +        self._last_revision_info_cache = None
|> +        self._partial_revision_history_cache = None
|
| Hmm.  What happened to the identity map we were using for caching?
|

I'm not sure what identity map you are talking about. There is a
revision_id_to_revno_map that gets cached, but at the moment it is expected to
be a complete map or None, so it isn't relevant. I would like to make it a lazy
Map object when we have the ability to generate partial dotted revision numbers
without accessing all of history, but we aren't there yet.

|>      @needs_read_lock
|>      def last_revision_info(self):
|> -        revision_string = self.control_files.get('last-revision').read()
|> -        revno, revision_id = revision_string.rstrip('\n').split(' ', 1)
|> -        revision_id = cache_utf8.get_cached_utf8(revision_id)
|> -        revno = int(revno)
|> -        return revno, revision_id
|> +        """Return information about the last revision.
|> +
|> +        :return: A tuple (revno, revision_id).
|> +        """
|> +        if self._last_revision_info_cache is None:
|> +            revision_string = self.control_files.get('last-revision').read()
|> +            revno, revision_id = revision_string.rstrip('\n').split(' ', 1)
|> +            revision_id = cache_utf8.get_cached_utf8(revision_id)
|> +            revno = int(revno)
|> +            self._last_revision_info_cache = revno, revision_id
|> +        return self._last_revision_info_cache
|
| It seems like the actual retrieval of last_revision_info should be split
| out into a private method.  That separates the concerns of caching and
| parsing a revision_info.
|
|     def _last_revision_info(self):
|         revision_string = self.control_files.get('last-revision').read()
|         revno, revision_id = revision_string.rstrip('\n').split(' ', 1)
|         revision_id = cache_utf8.get_cached_utf8(revision_id)
|         revno = int(revno)
|         return revno, revision_id
|
|     @needs_read_lock
|     def last_revision_info(self):
|         """Return information about the last revision.
|
|         :return: A tuple (revno, revision_id).
|         """
|         if self._last_revision_info_cache is None:
|             self._last_revision_info_cache = self._last_revision_info()
|         return self._last_revision_info_cache
|

It doesn't seem worth it for how small this is. It is really just an "if None"
trap around the rest. And certainly we use the same construct in other places.
But if it is blocking the benefits of the rest of the patch, I'll make the change.

|> @@ -1848,10 +1865,18 @@
|>      def _gen_revision_history(self):
|>          """Generate the revision history from last revision
|>          """
|> +        if self._partial_revision_history_cache:
|> +            last_revision = self._partial_revision_history_cache.pop(-1)
|> +            partial_history = self._partial_revision_history_cache
|> +            partial_history.reverse()
|> +            self._partial_revision_history_cache = None
|
| ^^^ I don't see why _gen_revision_history should clear the partial
| history.  It seems like a wasted opportunity to cache the whole history.
|
| It doesn't seem particularly useful for the partial history to be stored
| in forward order, either.

The partial history is stored in descending order. (the first entry is
last_revision().) We have to reverse it because _gen_revision_history() is
requesting the data in forward order (revno 1 comes first, last_revision() is
the final entry in the list.)

The reason _gen_revision_history() clears the partial cache is because the
"revision_history()" function calls "self._cache_revision_history".

Which means that we already have a cache to look up the values in.
(self._revision_history_cache).

|
|> @@ -1969,6 +1994,47 @@
|>          revno = len(history)
|>          self.set_last_revision_info(revno, revision_id)
|
|> +    @needs_read_lock
|> +    def get_rev_id(self, revno, history=None):
|> +        """Find the revision id of the specified revno."""
|> +        if revno == 0:
|> +            return _mod_revision.NULL_REVISION
|> +
|> +        last_revno, last_revision_id = self.last_revision_info()
|> +        if revno == last_revno:
|> +            return last_revision_id
|> +
|> +        if revno <= 0 or revno > last_revno:
|> +            raise errors.NoSuchRevision(self, revno)
|> +
|> +        if history is None:
|> +            history = self._revision_history_cache
|> +        if history is not None:
|> +            return history[revno - 1]
|> +
|> +        distance = last_revno - revno
|> +        if self._partial_revision_history_cache:
|> +            if len(self._partial_revision_history_cache) > distance:
|> +                return self._partial_revision_history_cache[distance]
|> +            distance -= len(self._partial_revision_history_cache) - 1
|> +            revision_id = self._partial_revision_history_cache[-1]
|> +        else:
|> +            self._partial_revision_history_cache = [last_revision_id]
|> +            revision_id = last_revision_id
|> +
|> +        history_iter = self.repository.iter_reverse_revision_history(
|> +            revision_id)
|> +        # iter_reverse_revision_history returns the revision_id we passed in as
|> +        # the first node, so skip past it.
|> +        history_iter.next()
|> +        for i in xrange(distance):
|> +            try:
|> +                revision_id = history_iter.next()
|> +            except StopIteration:
|> +                raise errors.NoSuchRevision(self, revno)
|> +            self._partial_revision_history_cache.append(revision_id)
|> +        return revision_id
|
| ^^^ I think this could be broken up a lot more cleanly; introduce an
| _iter_reverse_revision_history:
|
| def _iter_reverse_revision_history(self):
|     if self._partial_history_cache is None:
|         self._partial_history_cache = []
|     for revision_id in self._partial_history_cache:
|         yield revision_id
|     for revision_id in self.repository._iter_reverse_revision_history(
|         revision_id):
|         self._partial_history_cache.append()
|         yield revision_id
|
| And then get_rev_id is something like:
| def get_rev_id(revno)
|     cur_revno, last_revision = self.get_revision_info()
|     if last_revno < revno:
|         raise NoSuchRevision()
|     for revision_id in self._iter_reverse_revision_history():
|         if cur_revno == revno:
|             return revision_id
|         revno -= 1
|     else:
|         raise NoSuchRevision()

Except then you spend the time to iterate through a list instead of just using
an integer offset. Why do "revno -= 1" 500 times when you can just say "oh, my
list is 600 long, grab the 500th".

I'm happy to factor things around a little bit if you find the function hard to
read, but I don't think we want to iterate when we can find the answer from a
simple offset.

Also, we *should* be using _revision_history_cache if present. We want more and
more code to *not* fill it out, but as long as we have code that has called
"b.revision_history()" we may as well make use of it.

I'll try to put something together more tasteful, but I don't think your
solution is correct.

...

|
|> +    def test_as_revision_id(self):
|> +        self.assertAsRevisionId('r1', '1')
|> +        self.assertAsRevisionId('r2', '2')
|> +        self.assertAsRevisionId('r1', '-2')
|> +        self.assertAsRevisionId('r2', '-1')
|> +        self.assertAsRevisionId('alt_r2', '1.1.1')
|
| Since you've stripped ensure_null out of the callsites, I'd like a test
| that each revision spec emits NULL_REVISION for the null revision, not None.
|
| Aaron

I'm happy to add those tests.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH7hibJdeBCYSNAAMRAo3eAKCc8Q6H5Gd9MwcjtxKbSU08irj23gCggGZp
P+NPew9SVajdvImRwLF/UWs=
=+w4W
-----END PGP SIGNATURE-----