[MERGE] Reconcile can fix bad parent references, Andrew's version
Robert Collins
robertc at robertcollins.net
Tue Oct 2 05:12:40 BST 2007
On Sat, 2007-09-29 at 00:38 +1000, Andrew Bennetts wrote:
> === added file
> 'bzrlib/tests/repository_implementations/test_broken.py'
> ...
> +"""Tests that use BrokenRepoScenario objects.
> +
> +That is, tests for reconcile and check.
> +"""
We try to group tests by the functionality they test. This appears to be
grouping by how the tests were written, which will make them hard to
find. I suggest
test_check_reconcile.py
(if you need tests of both functions to be conflated)
> +
> +from bzrlib.inventory import Inventory, InventoryFile
> +from bzrlib.revision import Revision
> +from bzrlib.tests import TestNotApplicable
> +from bzrlib.tests.repository_implementations import
> TestCaseWithRepository
> +
> +import sha
PEP-8, standard lib before local imports.
> +class TestReconcile(TestCaseWithRepository):
> + """Tests for how reconcile corrects errors in parents of file
> versions."""
PEP-8, one line if possible, or one and a paragraph... and the class
name does not imply the narrow focus the docstring does. We know this is
a test class so perhaps
class TestFileParentReconciliation
"""The tests of incorrect file-version parents reconciliation."""
> + def assertCheckScenario(self, scenario):
> + repo =
> self.make_repository_using_factory(scenario.populate_repository)
> + self.require_text_parent_corruption(repo)
> + check_result = repo.check()
> + check_result.report_results(verbose=True)
> + for pattern in scenario.check_regexes():
> + self.assertContainsRe(
> + self._get_log(keep_log_file=True),
> + pattern)
This method needs a docstring and a better name. either assert or check
is redundant. And scenario is incredibly generic here, if you want to go
hungarian (naming after what a thing is, not what its for), then perhaps
reconcile_scenario).
> + def make_repository_using_factory(self, factory):
> + """Create a new repository populated by the given factory."""
> + repo = self.make_repository('broken-repo')
> + repo.lock_write()
> + try:
> + repo.start_write_group()
> + try:
> + factory(repo)
> + repo.commit_write_group()
> + return repo
> + except:
> + repo.abort_write_group()
> + raise
> + finally:
> + repo.unlock()
Perhaps 'make_populated_repository' ?
> + def add_revision(self, repo, revision_id, inv, parent_ids):
> + inv.revision_id = revision_id
> + inv.root.revision = revision_id
> + repo.add_inventory(revision_id, inv, parent_ids)
> + revision = Revision(revision_id,
> committer='jrandom at example.com',
> + timestamp=0, inventory_sha1='', timezone=0,
> message='foo',
> + parent_ids=parent_ids)
> + repo.add_revision(revision_id,revision, inv)
This could do with a docstring.
> + def require_text_parent_corruption(self, repo):
> + if not repo._reconcile_fixes_text_parents:
> + raise TestNotApplicable(
> + "Format does not support text parent reconciliation")
perhaps tweak the name, 'require_repo_suffers_text_parent_corruption'.
> + def file_parents(self, repo, revision_id):
> + return repo.weave_store.get_weave('a-file-id',
> + repo.get_transaction()).get_parents(revision_id)
This looks like something worth putting onto Repository.
> + def assertReconcileResults(self, scenario):
> + """Construct a repository and reconcile it, verifying the state before
> + and after.
> +
> + :param scenario: a Scenario to test reconcile on.
> + """
going from the method name this is more than an assertion, it takes
active action.
I suggest
checkReconcileBehaviour
> + # The content of the versionedfile should be the same after
> the
> + # reconcile.
'The content of the versions in the versionedfile ...'
> + def test_reconcile(self):
> + self.assertReconcileResults(self.scenario_class(self))
> +
> + def test_check(self):
> + self.assertCheckScenario(self.scenario_class(self))
These methods are very opaque.
> === added file 'bzrlib/tests/repository_implementations/test_check.py'
> --- bzrlib/tests/repository_implementations/test_check.py 1970-01-01 00:00:00 +0000
> +++ bzrlib/tests/repository_implementations/test_check.py 2007-09-28 12:56:56 +0000
> @@ -0,0 +1,135 @@
> +# Copyright (C) 2007 Canonical Ltd
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write to the Free Software
> +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> +
> +
> +"""Test operations that check the repository for corruption"""
> +
> +
> +from bzrlib import (
> + inventory,
> + revision as _mod_revision,
> + )
> +from bzrlib.repository import _RevisionTextVersionCache
> +from bzrlib.tests.repository_implementations import TestCaseWithRepository
> +
> +
> +class TestFindBadAncestors(TestCaseWithRepository):
> +
> + def make_broken_repository(self):
> + repo = self.make_repository('.')
> + cleanups = []
> + try:
> + repo.lock_write()
> + cleanups.append(repo.unlock)
> + repo.start_write_group()
> + cleanups.append(repo.commit_write_group)
> + # make rev1a: A well-formed revision, containing 'file1'
> + inv = inventory.Inventory(revision_id='rev1a')
> + inv.root.revision = 'rev1a'
> + self.add_file(repo, inv, 'file1', 'rev1a', [])
> + repo.add_inventory('rev1a', inv, [])
> + revision = _mod_revision.Revision('rev1a',
> + committer='jrandom at example.com', timestamp=0,
> + inventory_sha1='', timezone=0, message='foo', parent_ids=[])
> + repo.add_revision('rev1a',revision, inv)
> +
> + # make rev1b, which has no Revision, but has an Inventory, and
> + # file1
> + inv = inventory.Inventory(revision_id='rev1b')
> + inv.root.revision = 'rev1b'
> + self.add_file(repo, inv, 'file1', 'rev1b', [])
> + repo.add_inventory('rev1b', inv, [])
> +
> + # make rev2, with file1 and file2
> + # file2 is sane
> + # file1 has 'rev1b' as an ancestor, even though this is not
> + # mentioned by 'rev1a', making it an unreferenced ancestor
> + inv = inventory.Inventory()
> + self.add_file(repo, inv, 'file1', 'rev2', ['rev1a', 'rev1b'])
> + self.add_file(repo, inv, 'file2', 'rev2', [])
> + self.add_revision(repo, 'rev2', inv, ['rev1a'])
> +
> + # make ghost revision rev1c
> + inv = inventory.Inventory()
> + self.add_file(repo, inv, 'file2', 'rev1c', [])
> +
> + # make rev3 with file2
> + # file2 refers to 'rev1c', which is a ghost in this repository, so
> + # file2 cannot have rev1c as its ancestor.
> + inv = inventory.Inventory()
> + self.add_file(repo, inv, 'file2', 'rev3', ['rev1c'])
> + self.add_revision(repo, 'rev3', inv, ['rev1c'])
> + return repo
> + finally:
> + for cleanup in reversed(cleanups):
> + cleanup()
> +
> + def add_revision(self, repo, revision_id, inv, parent_ids):
> + inv.revision_id = revision_id
> + inv.root.revision = revision_id
> + repo.add_inventory(revision_id, inv, parent_ids)
> + revision = _mod_revision.Revision(revision_id,
> + committer='jrandom at example.com', timestamp=0, inventory_sha1='',
> + timezone=0, message='foo', parent_ids=parent_ids)
> + repo.add_revision(revision_id,revision, inv)
> +
> + def add_file(self, repo, inv, filename, revision, parents):
> + file_id = filename + '-id'
> + entry = inventory.InventoryFile(file_id, filename, 'TREE_ROOT')
> + entry.revision = revision
> + entry.text_size = 0
> + inv.add(entry)
> + vf = repo.weave_store.get_weave_or_empty(file_id,
> + repo.get_transaction())
> + vf.add_lines(revision, parents, ['line\n'])
The above logic looks very similar to that int he broken_repo tests, is
it duplicated?
> + def find_bad_ancestors(self, file_id, revision_ids):
> + repo = self.make_broken_repository()
> + vf = repo.weave_store.get_weave(file_id, repo.get_transaction())
> + return repo.find_bad_ancestors(revision_ids, file_id, vf,
> + _RevisionTextVersionCache(repo))
Going from your patch description this method and tests would seem to be
redundant now, perhaps delete it ?
> === modified file 'NEWS'
> --- NEWS 2007-09-24 09:40:01 +0000
> +++ NEWS 2007-09-26 10:49:46 +0000
> @@ -25,7 +25,10 @@
>
> FEATURES:
>
> - * New ``reconfigure`` command (Aaron Bentley)
> + * New ``reconfigure`` command. (Aaron Bentley)
> +
> + * New ``revert --forget-merges`` command, which removes the record of a pending
> + merge without affecting the working tree contents. (Martin Pool)
Why are these being changed by this patch?
> ...
> + * Fix 'unprintable error' message when displaying BzrCheckError and
> + some other exceptions on Python 2.5.
> + (Martin Pool, #144633)
And this?
> @@ -116,6 +123,10 @@
>
> INTERNALS:
>
> + * ``bzrlib.transport.Transport.put_file`` now returns the number of bytes
> + put by the method call, to allow avoiding stat-after-write or
> + housekeeping in callers. (Robert Collins)
And this?
> * New method ``bzrlib.osutils.minimum_path_selection`` useful for removing
> duplication from user input, when a user mentions both a path and an item
> contained within that path. (Robert Collins)
> @@ -333,6 +344,12 @@
> include them as Concepts within the User Reference.
> (Paul Moore, Ian Clatworthy)
>
> + * ``check`` can detect versionedfile parent references that are
> + inconsistent with revision and inventory info, and ``reconcile`` can fix
> + them. These faulty references were generated by 0.8-era releases,
> + so repositories which were manipulated by old bzrs should be
> + checked, and possibly reconciled ASAP. (Aaron Bentley)
...
> @@ -3048,19 +3063,24 @@
>
> _see_also = ['cat', 'export']
> takes_options = [
> - 'revision',
> - Option('no-backup', "Do not save backups of reverted files."),
> - ]
> + 'revision',
> + Option('no-backup', "Do not save backups of reverted files."),
> + Option('forget-merges',
> + 'Remove pending merge marker, without changing any files.'),
> + ]
> takes_args = ['file*']
>
> - def run(self, revision=None, no_backup=False, file_list=None):
> - if file_list is not None:
> - if len(file_list) == 0:
> - raise errors.BzrCommandError("No files specified")
> -
> + def run(self, revision=None, no_backup=False, file_list=None,
> + forget_merges=None):
> tree, file_list = tree_files(file_list)
> + if forget_merges:
> + tree.set_parent_ids(tree.get_parent_ids()[:1])
> + else:
> + self._revert_tree_to_revision(tree, revision, file_list, no_backup)
> +
> + @staticmethod
> + def _revert_tree_to_revision(tree, revision, file_list, no_backup):
> if revision is None:
> - # FIXME should be tree.last_revision
> rev_id = tree.last_revision()
> elif len(revision) != 1:
> raise errors.BzrCommandError('bzr revert --revision takes exactly 1 argument')
> @@ -3068,7 +3088,7 @@
> rev_id = revision[0].in_history(tree.branch).rev_id
> pb = ui.ui_factory.nested_progress_bar()
> try:
> - tree.revert(file_list,
> + tree.revert(file_list,
> tree.branch.repository.revision_tree(rev_id),
> not no_backup, pb, report_changes=True)
> finally:
Something is definately wrong in this bundle, there are changes here for
revert and other commands.
> === modified file 'bzrlib/check.py'
> --- bzrlib/check.py 2007-08-22 22:32:23 +0000
> +++ bzrlib/check.py 2007-09-28 08:18:16 +0000
> @@ -32,6 +32,7 @@
> # raising them. If there's more than one exception it'd be good to see them
> # all.
>
> +from bzrlib import repository as _mod_repository
> from bzrlib.errors import BzrCheckError
> import bzrlib.ui
> from bzrlib.trace import note
> @@ -53,6 +54,10 @@
> # maps (file-id, version) -> sha1; used by InventoryFile._check
> self.checked_texts = {}
> self.checked_weaves = {}
> + self.revision_versions = _mod_repository._RevisionTextVersionCache(
> + self.repository)
> + self.unreferenced_ancestors = set()
> + self.inconsistent_parents = []
>
> def check(self):
> self.repository.lock_read()
> @@ -64,36 +69,39 @@
> self.inventory_weave = self.repository.get_inventory_weave()
> self.plan_revisions()
> revno = 0
> - self.check_weaves()
> while revno < len(self.planned_revisions):
> rev_id = self.planned_revisions[revno]
> self.progress.update('checking revision', revno,
> len(self.planned_revisions))
> revno += 1
> self.check_one_rev(rev_id)
> + # check_weaves is done after the revision scan so that
> + # revision_versions is pre-populated
> + self.check_weaves()
> finally:
> self.progress.finished()
> self.repository.unlock()
>
> def plan_revisions(self):
> repository = self.repository
> - self.planned_revisions = set(repository.all_revision_ids())
> + self.planned_revisions = repository.all_revision_ids()
> self.progress.clear()
> inventoried = set(self.inventory_weave.versions())
> - awol = self.planned_revisions - inventoried
> + awol = set(self.planned_revisions) - inventoried
> if len(awol) > 0:
> raise BzrCheckError('Stored revisions missing from inventory'
> '{%s}' % ','.join([f for f in awol]))
> - self.planned_revisions = list(self.planned_revisions)
>
> def report_results(self, verbose):
> note('checked repository %s format %s',
> self.repository.bzrdir.root_transport,
> self.repository._format)
> note('%6d revisions', self.checked_rev_cnt)
> + note('%6d versionedfiles', len(self.checked_weaves))
I suggest this say '%6d file-ids'
> @@ -113,6 +121,19 @@
> note(' %s should be in the ancestry for:', link)
> for linker in linkers:
> note(' * %s', linker)
> + if verbose:
> + for file_id, revision_id in self.unreferenced_ancestors:
> + note('unreferenced ancestor: {%s} in %s', revision_id,
> + file_id)
This is a critical error, it should error, not output a warning.
> + if len(self.inconsistent_parents):
> + note('%6d inconsistent parents', len(self.inconsistent_parents))
> + if verbose:
> + for info in self.inconsistent_parents:
> + revision_id, file_id, found_parents, correct_parents = info
> + note(' * %s version %s has parents %r '
> + 'but should have %r'
> + % (file_id, revision_id, found_parents,
> + correct_parents))
This however, is fine.
> ...
> + result = w.find_bad_ancestors(
> + self.planned_revisions,
> + self.revision_versions.get_text_version,
> + weave_id,
> + revision_parents,
> + self.repository.get_graph())
> + for revision_id in result.iterkeys():
> + self.unreferenced_ancestors.add((weave_id, revision_id))
> self.checked_weaves[weave_id] = True
Why do you pass the weave id back into a method on the weave ? This
*really* looks like its barely related to the weave at all.
> def _check_revision_tree(self, rev_id):
> tree = self.repository.revision_tree(rev_id)
> + self.revision_versions.add_revision_text_version(tree)
I'd pluralise this method.
> inv = tree.inventory
> seen_ids = {}
> for file_id in inv:
But as it seems to be doing the same loop as this, for locality of
reference why not just accrue the id:versions in the loop and supply to
the cache afterwards.
> === modified file 'bzrlib/errors.py'
> --- bzrlib/errors.py 2007-09-13 01:54:49 +0000
> +++ bzrlib/errors.py 2007-09-25 06:29:46 +0000
> @@ -95,7 +95,11 @@
> try:
> fmt = self._get_format_string()
> if fmt:
> - s = fmt % self.__dict__
> + d = (self.__dict__)
> + # special case: python2.5 puts the 'message' attribute in a
> + # slot, so it isn't seen in __dict__
> + d['message'] = getattr(self, 'message', 'dummy message')
> + s = fmt % d
> # __str__() should always return a 'str' object
> # never a 'unicode' object.
> if isinstance(s, unicode):
This looks entirely unrelated.
> @@ -783,8 +787,8 @@
> # New code should prefer to raise specific subclasses
> def __init__(self, message):
> # Python 2.5 uses a slot for StandardError.message,
> - # so use a different variable name
> - # so it is exposed in self.__dict__
> + # so use a different variable name. We now work around this in
> + # BzrError.__str__, but this member name is kept for compatability.
> self.msg = message
And this comment here doesn't note that using '%(message)s %(msg)s' will
fail, so even if the code isn't new, its limited in an undocumented way.
> === modified file 'bzrlib/reconcile.py'
> --- bzrlib/reconcile.py 2007-09-12 04:21:51 +0000
> +++ bzrlib/reconcile.py 2007-09-28 04:40:46 +0000
> @@ -20,9 +20,14 @@
> __all__ = ['reconcile', 'Reconciler', 'RepoReconciler', 'KnitReconciler']
>
>
> -from bzrlib import ui
> +from bzrlib import (
> + errors,
> + graph,
> + ui,
> + repository,
> + )
> from bzrlib.trace import mutter
> -from bzrlib.tsort import TopoSorter
> +from bzrlib.tsort import TopoSorter, topo_sort
>
>
> def reconcile(dir, other=None):
> @@ -276,8 +281,8 @@
> """Perform the steps to reconcile this repository."""
> if self.thorough:
> self._load_indexes()
> - # knits never suffer this
> self._gc_inventory()
> + self._fix_text_parents()
>
> def _load_indexes(self):
> """Load indexes for the reconciliation."""
> @@ -337,3 +342,45 @@
> self.garbage_inventories = len(garbage)
> for revision_id in garbage:
> mutter('Garbage inventory {%s} found.', revision_id)
> +
> + def _fix_text_parents(self):
> + """Fix bad versionedfile parent entries.
> +
> + It is possible for the parents entry in a versionedfile entry to be
> + inconsistent with the values in the revision and inventory.
> +
> + This method finds entries with such inconsistencies, corrects their
> + parent lists, and replaces the versionedfile with a corrected version.
> + """
> + transaction = self.repo.get_transaction()
> + revision_parents = repository._RevisionParentsProvider(self.repo)
> + revision_graph = graph.Graph(revision_parents)
Why isn't this just
revision_graph = self.repo.get_graph() ?
> + if len(versions_with_bad_parents) == 0:
> + continue
I think from here to the end could be a separate method.
> + new_vf = self.repo.weave_store.get_empty('temp:%s' % file_id,
> + self.transaction)
> + new_parents = {}
> + for version in vf.versions():
> + if version in versions_with_bad_parents:
> + parents = versions_with_bad_parents[version][1]
> + else:
> + parents = vf.get_parents(version)
> + new_parents[version] = parents
> + for version in topo_sort(new_parents.items()):
> + new_vf.add_lines(version, new_parents[version],
> + vf.get_lines(version))
> + self.repo.weave_store.copy(new_vf, file_id, self.transaction)
> + self.repo.weave_store.delete('temp:%s' % file_id, self.transaction)
> === modified file 'bzrlib/repofmt/knitrepo.py'
> --- bzrlib/repofmt/knitrepo.py 2007-09-16 23:48:15 +0000
> +++ bzrlib/repofmt/knitrepo.py 2007-09-26 10:47:10 +0000
> @@ -69,6 +69,11 @@
> """Knit format repository."""
>
> _serializer = xml5.serializer_v5
> + def __init__(self, _format, a_bzrdir, control_files, _revision_store,
> + control_store, text_store):
> + MetaDirRepository.__init__(self, _format, a_bzrdir, control_files,
> + _revision_store, control_store, text_store)
> + self._reconcile_fixes_text_parents = True
PEP-8 - newline between _serializer and the def __init__.
>
> + def find_bad_ancestors(self, revision_ids, file_id, versionedfile,
> + revision_versions):
> + """Search the versionedfile for ancestors that are not referenced.
> +
> + The graph of a given versionedfile should be a subset of the graph
> + described by the repository's revisions. One possible deviation is
> + if a text's parents are not a subset of its revision's parents'
> + last-modified revisions. This deviation prevents
> + fileids_altered_by_revision_ids from correctly determining which
> + revisions of each text need to be fetched.
> +
> + This method detects this case.
> +
> + :param revision_ids: The revisions to scan for deviations
> + :file_id: The file-id of the versionedfile to scan
> + :versionedfile: The versionedfile to scan
> + :revision_versions: A dict that is a cache of last-modified revisions
> + of files for each version
> + :parents_provider: An implementation of ParentsProvider to use for
> + determining the revision graph's ancestry.
> + _RevisionParentsProvider is recommended for this purpose.
> + """
> + graph = self.get_graph()
> + return versionedfile.find_bad_ancestors(
> + revision_ids,
> + revision_versions.get_text_version,
> + file_id,
> + graph,
> + graph)
This method seems rather pointless now.
> +class _RevisionTextVersionCache(object):
> + """A cache of the versionedfile versions for revision and file-id"""
> +
> + def __init__(self, repository):
> + self.repository = repository
> + self.revision_versions = {}
> +
> + def add_revision_text_version(self, tree):
> + """Cache text version data from the supplied revision tree"""
> + inv_revisions = {}
> + for path, entry in tree.iter_entries_by_dir():
> + inv_revisions[entry.file_id] = entry.revision
> + self.revision_versions[tree.get_revision_id()] = inv_revisions
> + return inv_revisions
> +
> + def get_text_version(self, file_id, revision_id):
> + """Determine the text version for a given file-id and revision-id"""
> + try:
> + inv_revisions = self.revision_versions[revision_id]
> + except KeyError:
> + tree = self.repository.revision_tree(revision_id)
> + inv_revisions = self.add_revision_text_version(tree)
> + return inv_revisions.get(file_id)
This object does not look like a cache; it looks like a dict that will
autopopulate when you ask for something it does not know. So I suggest
not calling it a cache ;).
> +class _RevisionParentsProvider(object):
> + """A parents provider that uses a repositoy's revision objects.
> +
> + Should only be used when checking for corruption.
> +
> + For uncorrupt repositories, should give the same results as the repo's
> + get_parents implementation, except much more slowly.
> + """
> + def __init__(self, repo):
> + self._repo = repo
> + self._memoized = {}
> +
> + def get_parents(self, revision_ids):
> + parents_list = []
> + for revision_id in revision_ids:
> + try:
> + parents = self._memoized[revision_id]
> + except KeyError:
> + if revision_id == _mod_revision.NULL_REVISION:
> + parents = []
> + else:
> + try:
> + revision = self._repo.get_revision(revision_id)
> + parents = revision.parent_ids
> + except errors.NoSuchRevision:
> + parents = None
> + else:
> + if len(parents) == 0:
> + parents = [_mod_revision.NULL_REVISION]
> + self._memoized[revision_id] = parents
> + parents_list.append(parents)
> + return parents_list
Is this class now used or needed?
> === modified file 'bzrlib/tests/repository_implementations/__init__.py'
> --- bzrlib/tests/repository_implementations/__init__.py 2007-08-17 05:16:14 +0000
> +++ bzrlib/tests/repository_implementations/__init__.py 2007-09-28 14:07:01 +0000
> @@ -94,11 +94,410 @@
> relpath, format=format)
>
>
> +class BrokenRepositoryTestProviderAdapter(TestScenarioApplier):
> +
> + scenario_classes = [
> + FileParentIsNotInRevisionAncestryScenario,
> + FileParentHasInaccessibleInventoryScenario,
> + FileParentsNotReferencedByAnyInventoryScenario,
> + TooManyParentsScenario,
> + ClaimedFileParentDidNotModifyFileScenario,
> + IncorrectlyOrderedParentsScenario,
> + ]
> +
> + def __init__(self):
> + TestScenarioApplier.__init__(self)
> + self.scenarios = [
> + (s.__name__, {'scenario_class': s}) for s in
> self.scenario_classes]
I don't think a class is needed here. TestScenarioApplier is designed to
be usable without subclassing:
applier = TestScenarioApplier()
applier.scenarios = (s.__name__, {'scenario_class': s}) for s in
[
FileParentIsNotInRevisionAncestryScenario,
FileParentHasInaccessibleInventoryScenario,
FileParentsNotReferencedByAnyInventoryScenario,
TooManyParentsScenario,
ClaimedFileParentDidNotModifyFileScenario,
IncorrectlyOrderedParentsScenario,
])
> def test_suite():
> result = TestSuite()
> test_repository_implementations = [
> 'bzrlib.tests.repository_implementations.test_break_lock',
> + # note that test_broken is intentionally excluded from this list; it is
> + # handled further down.
You could make this more clear, and generic by creating all the appliers
you need, then making a list of tuples:
[(appliers, test_names), ...]
The stock tests are
((format_adapter,), [names])
The broken repo tests are
((format_adapter, broken_rep_applier), [test_broken_repo])
Then its fed straight into:
....
because theres a bunch of duplicate code at the moment.
> === modified file 'bzrlib/tests/repository_implementations/test_reconcile.py'
> --- bzrlib/tests/repository_implementations/test_reconcile.py 2007-08-22 05:28:32 +0000
> +++ bzrlib/tests/repository_implementations/test_reconcile.py 2007-09-28 14:09:52 +0000
> @@ -14,7 +14,7 @@
> # along with this program; if not, write to the Free Software
> # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
>
> -"""Tests for reconiliation of repositories."""
> +"""Tests for reconciliation of repositories."""
>
>
> import bzrlib
> @@ -23,10 +23,11 @@
> from bzrlib.reconcile import reconcile, Reconciler
> from bzrlib.revision import Revision
> from bzrlib.tests import TestSkipped
> -from bzrlib.tests.repository_implementations.test_repository import TestCaseWithRepository
> +from bzrlib.tests.repository_implementations.test_repository import (
> + TestCaseWithRepository,
> + )
> from bzrlib.transport import get_transport
> from bzrlib.uncommit import uncommit
> -from bzrlib.workingtree import WorkingTree
>
>
> class TestReconcile(TestCaseWithRepository):
> @@ -374,3 +375,29 @@
> repo = d.open_repository()
> self.checkUnreconciled(d, repo.reconcile())
> self.checkUnreconciled(d, repo.reconcile(thorough=True))
> +
> +
> +
> +#commiting rev X, row 0 local change, row 1 no local change against at least one parent
> +#rev parents X.revision inv.revision foreach parent knit parents label
> +#revA X revA revA parent-changed
> +#revA A revA NO ENTRY
> +#revA X revB revB parent-changed
> +#revA A revB NO ENTRY
> +#revA revB X revC revC revC parent changed (C in ancestory of A,B)
> +#revA revB C revC revC NO ENTRY
> +#revA revB X revA revC revA left-side change (C in ancestry of A)
> +#revA revB A revA revC NO ENTRY
> +#revA revB X revC revB revB right-side change (C in ancestry of B)
> +#revA revB B revC revB NO ENTRY
> +#revA revB X revA revB revA revB both-side change no prior join
> +#revA revB X revA revB revA revB both-side change no prior join (that is, we always record a change in this case)
> +#revA revB X revA revB revA both-side change, one side pre-merged (A merged B)
> +#revA revB A revA revB NO ENTRY
> +#revA revB X revC revD revC revD both-side parent change no prior join
> +#revA revB X revC revD revC revD both-side parent change no prior join (that is we always record a change in this case)
> +#revA revB X revC revD revC both sides parent change, C merged D
> +#revA revB C revC revD NO ENTRY
This matrix should go, or be documented for other readers.
> +
> + def test_check_error(self):
> + # This has a member called 'message', which is problematic in
> + # python2.5 because that is a slot on the base Exception class
> + e = errors.BzrCheckError('example check failure')
> + self.assertEqual(
> + "Internal check failed: example check failure",
> + str(e))
> + self.assertTrue(e.internal_error)
I don't think this test needs the commentary: all our exceptions should
be tested that they render successfully.
> === modified file 'bzrlib/versionedfile.py'
> --- bzrlib/versionedfile.py 2007-09-20 06:12:51 +0000
> +++ bzrlib/versionedfile.py 2007-09-28 05:53:43 +0000
> @@ -37,6 +37,7 @@
>
> from bzrlib.inter import InterObject
> from bzrlib.textmerge import TextMerge
> +from bzrlib.trace import mutter
>
>
> class VersionedFile(object):
> @@ -495,6 +496,111 @@
> b_marker=TextMerge.B_MARKER):
> return PlanWeaveMerge(plan, a_marker, b_marker).merge_lines()[0]
>
> + def calculate_parents(self, revision_id, get_text_version, file_id,
> + parents_provider, repo_graph, get_inventory):
> + text_revision = get_text_version(file_id, revision_id)
> + if text_revision is None:
> + return None
> + parents_of_text_revision = parents_provider.get_parents(
> + [text_revision])[0]
> + parents_from_inventories = []
> + for parent in parents_of_text_revision:
> + if parent == revision.NULL_REVISION:
> + continue
> + try:
> + inventory = get_inventory(parent)
> + except errors.RevisionNotPresent:
> + pass
> + else:
> + introduced_in = inventory[file_id].revision
> + parents_from_inventories.append(introduced_in)
> + mutter('%r:%r introduced in: %r',
> + file_id, revision_id, parents_from_inventories)
> + heads = set(repo_graph.heads(parents_from_inventories))
> + mutter(' heads: %r', heads)
> + new_parents = []
> + for parent in parents_from_inventories:
> + if parent in heads and parent not in new_parents:
> + new_parents.append(parent)
> + return new_parents
> +
> + def check_parents(self, revision_ids, get_text_version, file_id,
> + parents_provider, repo_graph, get_inventory):
> + result = {}
> + for num, revision_id in enumerate(revision_ids):
> + correct_parents = self.calculate_parents(revision_id,
> + get_text_version, file_id, parents_provider, repo_graph,
> + get_inventory)
> + if correct_parents is None:
> + continue
> + text_revision = get_text_version(file_id, revision_id)
> + knit_parents = self.get_parents(text_revision)
> + if correct_parents != knit_parents:
> + result[revision_id] = (knit_parents, correct_parents)
> + mutter(' RESULT: %r', result)
> + return result
> +
> + def find_bad_ancestors(self, revision_ids, get_text_version, file_id,
> + parents_provider, repo_graph):
> + """Search this versionedfile for ancestors that are not referenced.
> +
> + One possible deviation is if a text's parents are not a subset of its
> + revision's parents' last-modified revisions. This deviation prevents
> + fileids_altered_by_revision_ids from correctly determining which
> + revisions of each text need to be fetched.
> +
> + This method detects this case.
> +
> + :param revision_ids: The revisions to scan for deviations
> + :param file_id: The file-id of the versionedfile to scan
> + :param get_text_version: a callable that takes two arguments,
> + file_id and a revision_id, and returns the id of text version of
> + that file in that revision.
> + :param parents_provider: An implementation of ParentsProvider to use
> + for determining the revision graph's ancestry.
> + _RevisionParentsProvider is recommended for this purpose.
> +
> + :returns: a dict mapping bad parents to a set of revisions they occur
> + in.
> + """
> + result = {}
> + from bzrlib.trace import mutter
> + for num, revision_id in enumerate(revision_ids):
> +
> + #if revision_id == 'broken-revision-1-2': import pdb; pdb.set_trace()
> + #if revision_id == 'broken-revision-1-2':
> + # result.setdefault('parent-1',set()).add('broken-revision-1-2')
> + # result.setdefault('parent-2',set()).add('broken-revision-1-2')
> + text_revision = get_text_version(file_id, revision_id)
> + if text_revision is None:
> + continue
> +
> + file_parents = parents_provider.get_parents([text_revision])[0]
> + revision_parents = set()
> + for parent_id in file_parents:
> + try:
> + revision_parents.add(get_text_version(file_id, parent_id))
> + # Skip ghosts (this means they can't provide texts...)
> + except errors.RevisionNotPresent:
> + continue
> + # XXX:
> + knit_parents = set(self.get_parents(text_revision))
> + unreferenced = knit_parents.difference(revision_parents)
> + for unreferenced_id in unreferenced:
> + result.setdefault(unreferenced_id, set()).add(text_revision)
> +
> + correct_parents = tuple(repo_graph.heads(knit_parents))
> + spurious_parents = knit_parents.difference(correct_parents)
> + for spurious_parent in spurious_parents:
> + result.setdefault(spurious_parent, set()).add(text_revision)
> + # XXX: false positives
> + #text_parents = self.get_parents(text_revision)
> + #if text_parents != file_parents:
> + # for text_parent in text_parents:
> + # result.setdefault(text_parent, set()).add(text_revision)
> + mutter('find_bad_ancestors: %r', result)
> + return result
> +
These methods are definately in the wrong place.
In summary there are three methods:
calculate_parents
check_parents
find_bad_ancestors
I don't know whats different between the last two - why are there three
methods and not two?
In terms of the right home for this, VersionedFile is the wrong place,
because it conflates delta storage with per-file graph caching. Knits
are not able to decouple these without a disk change, but packs do
decouple (at least on disk) this.
CommitBuilder currently is the right place to calculate what the
per-file graph parents for a text version are; I'm not sure what the
right place for versioned file is; but the fact that calculate_parents
does not use self at all, and check_parents only uses self to get the
parents from the knit, and to access calculate_parents is a pretty clear
smell to me.
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20071002/6040d005/attachment-0001.pgp
More information about the bazaar
mailing list