[merge] InterKnit.join() doesn't need the complete revision graph
Robert Collins
robertc at robertcollins.net
Sun Sep 10 22:40:55 BST 2006
On Fri, 2006-09-08 at 20:27 -0500, John Arbash Meinel wrote:
> John Arbash Meinel wrote:
> >
> > Also, would it be reasonable to use merge_sort() since that seems to be
> > faster than topo_sort?
It shouldn't be faster! If it is, then topo_sort can be made faster
still.
> > John
> > =:->
> >
> >
> > ------------------------------------------------------------------------
> >
> > === modified file 'bzrlib/knit.py'
> > --- bzrlib/knit.py 2006-09-06 21:56:58 +0000
> > +++ bzrlib/knit.py 2006-09-09 01:13:46 +0000
> > @@ -1601,7 +1601,7 @@
> >
> > if not needed_versions and not mismatched_versions:
> > return 0
> > - full_list = topo_sort(self.source.get_graph())
> > + full_list = topo_sort(self.source.get_graph(version_ids))
> >
> > version_list = [i for i in full_list if (not self.target.has_version(i)
> > and i in needed_versions)]
> >
Does this change handle ghosts properly? If it passes tests it should,
but thats my recollection of the reason to grab the full graph. Perhaps
this is a race condition with how the knit code came together. Anyhow,
if its not the full_list anymore, please rename the variable, and look a
little larger in scope - you may find this is a fully redundant lookup.
> Another thing which I just thought of. Which is we probably should
> change the second line to:
> version_list = [v for v in full_list
> if (i in needed_versions
> and (not self.target.has_version(i))]
>
> I'm guessing that "i in needed_versions" is going to be faster than "not
> self.target.has_version(i)"
>
> Because
>
> 1) has_version() is a complete function call with frame overhead, versus
> just a set lookup.
> 2) the needed_versions() is going to be a smaller set() for branches
> that are up-to-date with eachother. Which means there are fewer possible
> collisions. I know dict() and set() are supposed to be approx O(1), but
> I've found that they aren't. When a dict is really small it is faster
> than when it is really full. I don't know the specific curve, but 90%
> positive that this is the case.
Why not add '
def present_versions(self, versions):
"""Probe for a set of versions in the versioned file.
:param versions: Versions which may be present.
:return: The largest subset of versions whose contents are all
in this versioned file.
"""
versions = set(versions)
return set(version for version in versions if version in
self._index.versions)
Or something like that: Allow a set-like filter to get the
interesting/missing (by set inversion) versions for the remote side. (Or
call it missing_versions ;)).
Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060911/053ac0f5/attachment.pgp
More information about the bazaar
mailing list