[MERGE] bzr check detects unreferenced revisions

Thu Aug 23 07:10:16 BST 2007

BB:tweak

I like this, I just would like it to be a little clearer, especially in
communicating the intent of the test.  Also, you don't don't see to report all
the problems you catch.  But basically I'm happy with this change.

Aaron Bentley wrote:
[...]
> 
> This is my output for bzr:
> 
>  15566 revisions
>  35221 unique file texts
> 5213198 repeated file texts
> vvv
>   2450 unreferenced file texts
> ^^^

It'd be good to tell users what this means, and what to do about it.  But this
is a good incremental step towards that.

I see you also list the unreferenced file texts if you pass --verbose, which
helps.

bzrlib/tests/repository_implementations/test_check.py
-----------------------------------------------------

[...]
> +class TestCheckFileVersions(repository_implementations.TestCaseWithRepository):
> +
> +    def create_damaged_repository(self):

It would be good to have a brief docstring on this method describing the way the
repository is damaged.  e.g.:

        """Create a damaged repository at '.'.

        Revision 'rev2' will have a file with ID 'file2' with inconsistent 
        data: the inventory will reference file2:rev1b, even though there is a
	file2:rev2 stored in the repository.  That is, file2:rev2 is an
        unreferenced file version.
        """

You could probably do a better job of explaining than I did.

This would save people like me that aren't deeply familiar with this stuff a few
minutes trying to figure out what's going on :)

Thanks for making a separate creation method for that though, it makes the test
much tidier and more readable.

> +    def test_check_file_versions(self):
> +        repo = self.create_damaged_repository()
> +        results = repo.check_file_versions(repo.revision_tree('rev1a'))
> +        self.assertEqual('missing-file', results[('TREE_ROOT', 'rev1a')])
> +        self.assertEqual('missing-version', results[('file1-id', 'rev1a')])
> +        self.assertIs(None, results.get(('file2-id', 'rev1a')))
> +        results = repo.check_file_versions(repo.revision_tree('rev2'))
> +        self.assertEqual('unreferenced-version',
> +                         results[('file2-id', 'rev2')])
> +        self.assertIs(None, results.get(('file3-id', 'rev2')))

Hmm, you seem to be testing several different behaviours here, without clearly
explaining the intent of any of them.  Maybe break this up in to multiple test
methods, each with a docstring explicitly describing on the condition you want
to test.

The risk with zero-comment test methods that do several things is that coverage
is likely to get lost in later revisions.  It's easy to miss stuff when the
original intent isn't crystal clear.

bzrlib/check.py
---------------

[...]
> +        problems = self.repository.check_file_versions(tree)
> +        for key, problem in problems.iteritems():
> +            if problem == 'unreferenced-version':
> +                self.unreferenced_file_versions.add(key)

Hmm, so nothing is reported for missing-file or missing-version?  That doesn't
seem right.

bzrlib/repository.py
--------------------

[...]
> +    def check_file_versions(self, revision_tree):
> +        """Check the storage of the file versions for specified revisions
> +

Nitpick: sentences should finish with a full stop.

> +        Iterates through pairs of (revision_id, result_dict).
> +        """

I don't understand what this means.  Is it describing the return value, the
implementation, or something else?  What is “result_dict”?  What is the return
value.

Ideally this docstring would have “:param revision_tree:” and “:returns:”
sections.

> +        revision_id = revision_tree.get_revision_id()
> +        inv = revision_tree.inventory
> +        problems = {}
> +        for path, entry in inv.iter_entries_by_dir():
> +            try:
> +                vf = self.weave_store.get_weave(entry.file_id,
> +                    self.get_transaction())
> +            except errors.NoSuchFile:
> +                problems[(entry.file_id, entry.revision)] = 'missing-file'
> +                continue
> +            if entry.revision not in vf:
> +                problems[(entry.file_id, entry.revision)] =\
> +                    'missing-version'
> +            if entry.revision != revision_id:
> +                if revision_id in vf:
> +                    problems[(entry.file_id, revision_id)] =\
> +                        'unreferenced-version'
> +        return problems

This looks good to me.

-Andrew.