[MERGE][0.92] Fix bug 155730: force file versions with unreferenced parents to be stored as fulltexts.

Andrew Bennetts andrew at canonical.com
Fri Oct 26 08:06:38 BST 2007


Ian Clatworthy wrote:
> Andrew Bennetts wrote:
[...]
> 
> The original bug mentioned that 'bzr check' never complained even though
> a genuine problem existed. This change fixes check.py to exist the new
> API but still isn't reporting the issue as best I can see. In other
> words, dangling_versions being non-empty ought to trigger some output I
> think.

I've made it report this information (as “N file versions are not referenced by
their inventory”).

> > === modified file 'bzrlib/repository.py'
> > --- bzrlib/repository.py	2007-10-19 17:00:10 +0000
> > +++ bzrlib/repository.py	2007-10-25 00:14:05 +0000
> > @@ -2535,6 +2535,19 @@
> >              self.revision_parents[revision_id] = parents
> >              return parents
> >  
> > +    def used_file_versions(self):
> > +        """Return a set of (revision_id, file_id) pairs for each file version
> > +        referenced by any inventory cached by this _RevisionTextVersionCache.
> > +
> > +        If the entire repository has been cached, this can be used to find all
> 
> From a API design perspective, I don't like this. If someone 'forgot' to
> cache the entire repository by calling populate_revs, then they would be
> processing a list of versions thinking they were complete when they
> weren't. The consequences might be nasty given the sort of code we're
> talking about.
> 
> It's probably ok to leave this method but I'd add one called
> all_file_versions which explicitly checked the cache was fully primed.
> Alternatively, reword the docstring to say "This method ensures the
> entire repository is cached and ...".

I think it's ok as is.  The docstring makes it clear what the limitations and
assumptions are, and there's no easy way to check here that the entire
repository has been cached.

I share your concern, but I don't see a better alternative at the moment.

> >      def check_file_version_parents(self, weave, file_id):
> > -        result = {}
> > +        """Check the parents stored in a versioned file are correct.
> > +
> > +        It also detects file versions that are not referenced by their
> > +        corresponding revision's inventory.
> > +
> > +        :returns: A tuple of (wrong_parents, dangling_file_version).
> > +            wrong_parents is a dict mapping {revision_id: (stored_parents,
> > +            correct_parents)} for each revision_id where the stored parents
> > +            are not correct.  dangling_file_version is a set of revision_ids
> > +            for versions that are present in this versioned file, but not used
> > +            by the corresponding inventory.
> > +        """
> > +        wrong_parents = {}
> > +        dangling_file_version = set()
> 
> dangling_file_versions (plural) would be clearer, both in the docstring
> and the code. Code using the API uses the plural name already BTW.

Good point.  Done.

> >              vf_checker = self.repo.get_versioned_file_checker(
> > -                versions, revision_versions)
> > -            versions_with_bad_parents = vf_checker.check_file_version_parents(
> > -                vf, file_id)
> > -            if len(versions_with_bad_parents) == 0:
> > +                vf.versions(), revision_versions)
> > +            versions_with_bad_parents, dangling_file_versions = \
> > +                vf_checker.check_file_version_parents(vf, file_id)
> 
> Buried in that change, get_versioned_file_checker now takes
> vf.versions() as it 1st arg rather than versions (which was obtained
> from the revision_store). Is this deliberate or doesn't it matter? If
> there's a subtle difference, then it might be worth a comment mentioning
> why the vf.versions() must be used now.

This is intentional: there's no point checking versions of a file that we don't
have.  It's a minor optimisation.

> > +            full_text_versions = set()
> > +            unused_versions = set()
> > +            if (len(versions_with_bad_parents) == 0 and
> > +                len(dangling_file_versions) == 0):
> >                  continue
> 
> The if stmt should go before the 2 initialisation statements here.

Done.

> > +        :returns: A tuple of (wrong_parents, dangling_file_version).
> > +            wrong_parents is a dict mapping {revision_id: (stored_parents,
> > +            correct_parents)} for each revision_id where the stored parents
> > +            are not correct.  dangling_file_version is a set of revision_ids
> 
> Actually, it a set of (file-id,revision-id) tuples.

Oops.  Fixed.

> >      def all_versions(self):
> > -        return ['rev1a', 'rev2', 'rev4', 'rev2b', 'rev4', 'rev2c', 'rev5']
> > +        return ['rev1a', 'rev2c', 'rev4', 'rev5']
> 
> I originally thought this was a bad merge because 'rev2' etc clearly are
> in the list of versions in the populated_parents and corrected_parents
> given immediately below. Renaming this from all_versions to something
> more explicit as discussed on IRC would be good.

I've renamed this to “all_versions_after_reconcile”.

Thanks very much for the review!

-Andrew.




More information about the bazaar mailing list