[MERGE][bug 248540] Handle ghosts in mainline a bit better

Mon Jun 1 21:42:58 BST 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

bb:resubmit

Jelmer Vernooij wrote:
> === modified file 'bzrlib/branch.py'
> --- bzrlib/branch.py	2009-05-26 20:32:34 +0000
> +++ bzrlib/branch.py	2009-05-28 16:04:39 +0000
> @@ -1096,11 +1096,17 @@
>              # To figure out the revno for a random revision, we need to build
>              # the revision history, and count its length.
>              # We don't care about the order, just how long it is.
> -            # Alternatively, we could start at the current location, and count
> -            # backwards. But there is no guarantee that we will find it since
> -            # it may be a merged revision.
> -            revno = len(list(self.repository.iter_reverse_revision_history(
> -                                                                revision_id)))
> +            try:
> +                revno = len(list(self.repository.iter_reverse_revision_history(
> +                    revision_id)))
> +            except errors.RevisionNotPresent:
> +                # One of the left hand side ancestors is a ghost
> +                graph = self.repository.get_graph()
> +                try:
> +                    revno = graph.find_distance_to_null(revision_id, [])
> +                except errors.GhostRevisionsHaveNoRevno:
> +                    # Default to 1, if we can't find anything else
> +                    revno = 1

I think it would make sense to use find_distance_to_null all the time,
not ever using iter_reverse_revision_history here.  It will typically
yield results with fewer graph traversals than
iter_reverse_revision_history, provided it's properly seeded.

You should also seed it properly, by supplying [(source_revision_id,
source_revno)] as the known_revision_ids parameter.  This also lets you
remove the case for source_revision_id == revision_id, because
find_distance_to_null handles that trivially.

> @@ -1153,8 +1159,13 @@
>          """
>          mainline_parent_id = None
>          last_revno, last_revision_id = self.last_revision_info()
> -        real_rev_history = list(self.repository.iter_reverse_revision_history(
> -                                last_revision_id))
> +        try:
> +            real_rev_history = list(
> +                self.repository.iter_reverse_revision_history(last_revision_id))
> +        except errors.RevisionNotPresent:
> +            ret = BranchCheckResult(self)
> +            ret.ghosts_in_mainline = True
> +            return ret

I don't think you need to abort the check immediately here.  Instead,
you can store all available revisions in real_rev_history, and continue
with the checks.

> === modified file 'bzrlib/repository.py'
> --- bzrlib/repository.py	2009-05-12 04:54:04 +0000
> +++ bzrlib/repository.py	2009-05-28 16:04:39 +0000
> @@ -2244,7 +2244,10 @@
>              # RevisionNotPresent here until we see a use for it, because of the
>              # cost in an inner loop that is by its very nature O(history).
>              # Robert Collins 20080326
> -            parents = graph.get_parent_map([next_id])[next_id]
> +            try:
> +                parents = graph.get_parent_map([next_id])[next_id]
> +            except KeyError:
> +                raise errors.RevisionNotPresent(next_id, self)

You should update or remove Robert's comment.  (Is there any observable
performance impact?)

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkokPU4ACgkQ0F+nu1YWqI0wAgCfQX+9YLTLHZByfGC/0Fteh2xp
awsAn2GHeVJwKBKhqv2EhHOLw3TTAZJr
=Rfri
-----END PGP SIGNATURE-----