[MERGE] RevisionSpec.as_revision_id()

Sat Mar 29 16:58:55 GMT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> Just as an aside, I don't know if people realize I'm on vacation for the
> next 3
> weeks. My net access is pretty poor, so if people could submit the
> patches that
> they approve, I would appreciate it.

I'll apply my own review comments and re-submit it.  It's unfortunate
that no one found the time to review it before you left.

> | Also, I think we will probably want to support getting trees from
> | revision specs at some point-- the 'current:' tree is an obvious one,
> | but also perhaps merge preview trees.
> 
> This could be interesting, but it gets pretty hard to do from our apis.
> Figuring
> out context becomes pretty difficult. Because sometimes you'll want at
> WorkingTree context, and others a Branch. Unless you want the context to be
> "magic" we then need to have a bunch of parameters for when you do have
> a tree,
> or not.

The signature would probably have to be something like:

RevisionSpec.as_tree(branch, workingtree)

> We could certainly have a "as_tree(context)" sort of function, which would
> return a RevisionTree when appropriate. However, if we were to do that,
> I would
> want lazy RevisionTrees again (since you often don't need anything but the
> .revision_id, and extracting the whole inventory just to get
> .revision_id is
> foolish.)

I'm not sure when we would need them to be lazy-- so they could be
passed in as context?  When they're returned?

If all we need is the revision id, I certainly think it would be a bad
idea to use as_tree.  But if the operation we're doing requires it
(merge, diff, status, etc) then we'll need it unless we error out.  And
accelerating error conditions is not a high priority for me.

> | ^^^ what's the purpose of moving this?  The previous position gave
> | write_revision_history the opportunity to populate the cache.
> 
> This wasn't my code, I thought he had tested it and found it was
> important to do
> after. I'll try to check it. However, you removed the next line which is:

When you submit something, you should take responsibility for it as if
it was your code.  It's possible that his reasons are now obsolete.
Anyhow, I'll try moving it back and see if it breaks.

> | Hmm.  What happened to the identity map we were using for caching?
> |
> 
> I'm not sure what identity map you are talking about.

Branches used to use transactions for caching so that they didn't have
to explicitly flush all the caches themselves.  The transactions had an
identitymap.  That's the kind of code I was expecting to find here, but
it seems to have all been removed, and gannotate's not good at telling
me when (or why!).

> It doesn't seem worth it for how small this is. It is really just an "if
> None"
> trap around the rest.

- From an extensibility perspective, I think it's the right thing to do.
The way that the last revision info is likely to be the key difference
between branch formats going forward.

Because it idents the old code, it increases the size of the diff
without adding any value.

> And certainly we use the same construct in other
> places.
> But if it is blocking the benefits of the rest of the patch, I'll make
> the change.

I'll do it.

> The reason _gen_revision_history() clears the partial cache is because the
> "revision_history()" function calls "self._cache_revision_history".
> 
> Which means that we already have a cache to look up the values in.
> (self._revision_history_cache).

That cache should be deprecated.  We always want to do partial-history
operations.

> | ^^^ I think this could be broken up a lot more cleanly; introduce an
> | _iter_reverse_revision_history:
> |
> | def _iter_reverse_revision_history(self):
> |     if self._partial_history_cache is None:
> |         self._partial_history_cache = []
> |     for revision_id in self._partial_history_cache:
> |         yield revision_id
> |     for revision_id in self.repository._iter_reverse_revision_history(
> |         revision_id):
> |         self._partial_history_cache.append()
> |         yield revision_id
> |
> | And then get_rev_id is something like:
> | def get_rev_id(revno)
> |     cur_revno, last_revision = self.get_revision_info()
> |     if last_revno < revno:
> |         raise NoSuchRevision()
> |     for revision_id in self._iter_reverse_revision_history():
> |         if cur_revno == revno:
> |             return revision_id
> |         revno -= 1
> |     else:
> |         raise NoSuchRevision()
> 
> Except then you spend the time to iterate through a list instead of just
> using
> an integer offset. Why do "revno -= 1" 500 times when you can just say
> "oh, my
> list is 600 long, grab the 500th".
> 
> I'm happy to factor things around a little bit if you find the function
> hard to
> read, but I don't think we want to iterate when we can find the answer
> from a
> simple offset.
> 
> Also, we *should* be using _revision_history_cache if present.

No, I don't think we should use _revision_history_cache in new code.
Using _partial_history_cache scales just as well as using
_revision_history_cache.

Instead of "oh my list is 600 long, graph the 500", you do "oh my list
is 100 long, and the current revno is 600, so grab the 100th"

If _revision_history_cache is populated, then partial_history_cache
should also be populated.  And if _revision_history_cache isn't
populated, you have a performance win.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH7nVP0F+nu1YWqI0RAj4CAJ0fnrtPPpqbdVshx7kN9cprWk+qEwCfQVEk
UV/F6Uy7NzXCNHMQviIpdgM=
=j0Y4
-----END PGP SIGNATURE-----