[merge] InterKnit.join() doesn't need the complete revision graph

Sun Sep 10 22:40:55 BST 2006

On Fri, 2006-09-08 at 20:27 -0500, John Arbash Meinel wrote:
> John Arbash Meinel wrote:

> > 
> > Also, would it be reasonable to use merge_sort() since that seems to be
> > faster than topo_sort?

It shouldn't be faster! If it is, then topo_sort can be made faster
still.

> > John
> > =:->
> > 
> > 
> > ------------------------------------------------------------------------
> > 
> > === modified file 'bzrlib/knit.py'
> > --- bzrlib/knit.py	2006-09-06 21:56:58 +0000
> > +++ bzrlib/knit.py	2006-09-09 01:13:46 +0000
> > @@ -1601,7 +1601,7 @@
> >      
> >              if not needed_versions and not mismatched_versions:
> >                  return 0
> > -            full_list = topo_sort(self.source.get_graph())
> > +            full_list = topo_sort(self.source.get_graph(version_ids))
> >      
> >              version_list = [i for i in full_list if (not self.target.has_version(i)
> >                              and i in needed_versions)]
> > 

Does this change handle ghosts properly? If it passes tests it should,
but thats my recollection of the reason to grab the full graph. Perhaps
this is a race condition with how the knit code came together. Anyhow,
if its not the full_list anymore, please rename the variable, and look a
little larger in scope - you may find this is a fully redundant lookup.

> Another thing which I just thought of. Which is we probably should
> change the second line to:
>             version_list = [v for v in full_list
>                             if (i in needed_versions
>                                 and (not self.target.has_version(i))]
> 
> I'm guessing that "i in needed_versions" is going to be faster than "not
> self.target.has_version(i)"
> 
> Because
> 
> 1) has_version() is a complete function call with frame overhead, versus
> just a set lookup.
> 2) the needed_versions() is going to be a smaller set() for branches
> that are up-to-date with eachother. Which means there are fewer possible
> collisions. I know dict() and set() are supposed to be approx O(1), but
> I've found that they aren't. When a dict is really small it is faster
> than when it is really full. I don't know the specific curve, but 90%
> positive that this is the case.

Why not add '
    def present_versions(self, versions):
        """Probe for a set of versions in the versioned file.

        :param versions: Versions which may be present.
        :return: The largest subset of versions whose contents are all 
            in this versioned file.
        """
        versions = set(versions)
        return set(version for version in versions if version in 
             self._index.versions)

Or something like that: Allow a set-like filter to get the
interesting/missing (by set inversion) versions for the remote side. (Or
call it missing_versions ;)).

Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060911/053ac0f5/attachment.pgp