Rev 4353: (andrew) Pass missing parent inventories check if all referenced texts are present in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Tue May 12 05:54:15 BST 2009


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 4353
revision-id: pqm at pqm.ubuntu.com-20090512045404-3gjp1vrdmrjpwjgh
parent: pqm at pqm.ubuntu.com-20090512034718-8q4bonu5vryuwg45
parent: andrew.bennetts at canonical.com-20090512013608-0rkuiorp6zm2ikps
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Tue 2009-05-12 05:54:04 +0100
message:
  (andrew) Pass missing parent inventories check if all referenced texts are present
modified:
  bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
  bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
  bzrlib/tests/per_repository/test_write_group.py test_write_group.py-20070716105516-89n34xtogq5frn0m-1
    ------------------------------------------------------------
    revno: 4309.1.8
    revision-id: andrew.bennetts at canonical.com-20090512013608-0rkuiorp6zm2ikps
    parent: andrew.bennetts at canonical.com-20090511224622-yvbt3zhruutamyvt
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: all-referenced-texts-check
    timestamp: Tue 2009-05-12 11:36:08 +1000
    message:
      Remove old TODO.
    modified:
      bzrlib/tests/per_repository/test_write_group.py test_write_group.py-20070716105516-89n34xtogq5frn0m-1
    ------------------------------------------------------------
    revno: 4309.1.7
    revision-id: andrew.bennetts at canonical.com-20090511224622-yvbt3zhruutamyvt
    parent: andrew.bennetts at canonical.com-20090511223630-crp83zx31froq240
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: all-referenced-texts-check
    timestamp: Tue 2009-05-12 08:46:22 +1000
    message:
      Fix bug found by acceptance test: we need to flush writes (if we are buffering them) before trying to determine the missing_keys in _locked_insert_stream.
    modified:
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
    ------------------------------------------------------------
    revno: 4309.1.6
    revision-id: andrew.bennetts at canonical.com-20090511223630-crp83zx31froq240
    parent: andrew.bennetts at canonical.com-20090511221427-s78n13gej60hqr2y
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: all-referenced-texts-check
    timestamp: Tue 2009-05-12 08:36:30 +1000
    message:
      Exit get_missing_parent_inventories early (without checking texts) if there are no missing parent inventories.
    modified:
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/tests/per_repository/test_write_group.py test_write_group.py-20070716105516-89n34xtogq5frn0m-1
    ------------------------------------------------------------
    revno: 4309.1.5
    revision-id: andrew.bennetts at canonical.com-20090511221427-s78n13gej60hqr2y
    parent: andrew.bennetts at canonical.com-20090511083602-rbnn9438cyc0q13g
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: all-referenced-texts-check
    timestamp: Tue 2009-05-12 08:14:27 +1000
    message:
      Remove lots of cruft.
    modified:
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/tests/per_repository/test_write_group.py test_write_group.py-20070716105516-89n34xtogq5frn0m-1
    ------------------------------------------------------------
    revno: 4309.1.4
    revision-id: andrew.bennetts at canonical.com-20090511083602-rbnn9438cyc0q13g
    parent: andrew.bennetts at canonical.com-20090511083152-o8bslc85cqhdqd04
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: all-referenced-texts-check
    timestamp: Mon 2009-05-11 18:36:02 +1000
    message:
      Remove some cruft.
    modified:
      bzrlib/tests/per_repository/test_write_group.py test_write_group.py-20070716105516-89n34xtogq5frn0m-1
    ------------------------------------------------------------
    revno: 4309.1.3
    revision-id: andrew.bennetts at canonical.com-20090511083152-o8bslc85cqhdqd04
    parent: andrew.bennetts at canonical.com-20090429095057-k8pa6looyziusuhn
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: all-referenced-texts-check
    timestamp: Mon 2009-05-11 18:31:52 +1000
    message:
      Start testing more cases, and start factoring those tests a little more clearly.
    modified:
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/tests/per_repository/test_write_group.py test_write_group.py-20070716105516-89n34xtogq5frn0m-1
    ------------------------------------------------------------
    revno: 4309.1.2
    revision-id: andrew.bennetts at canonical.com-20090429095057-k8pa6looyziusuhn
    parent: andrew.bennetts at canonical.com-20090429083826-qd48gnjw9pgj4c63
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: all-referenced-texts-check
    timestamp: Wed 2009-04-29 19:50:57 +1000
    message:
      Tentative fix for bug 368418: only fail the missing parent inventories check if there are missing texts that appear to be altered by the inventories with missing parents.
    modified:
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
    ------------------------------------------------------------
    revno: 4309.1.1
    revision-id: andrew.bennetts at canonical.com-20090429083826-qd48gnjw9pgj4c63
    parent: pqm at pqm.ubuntu.com-20090428170537-tbjh8ng5lrws1er9
    committer: Andrew Bennetts <andrew.bennetts at canonical.com>
    branch nick: all-referenced-texts-check
    timestamp: Wed 2009-04-29 18:38:26 +1000
    message:
      Track which keys referenced the missing parents.
    modified:
      bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
=== modified file 'bzrlib/knit.py'
--- a/bzrlib/knit.py	2009-04-27 23:14:00 +0000
+++ b/bzrlib/knit.py	2009-04-29 09:50:57 +0000
@@ -2688,6 +2688,44 @@
         return key[:-1], key[-1]
 
 
+class _KeyRefs(object):
+
+    def __init__(self):
+        # dict mapping 'key' to 'set of keys referring to that key'
+        self.refs = {}
+
+    def add_references(self, key, refs):
+        # Record the new references
+        for referenced in refs:
+            try:
+                needed_by = self.refs[referenced]
+            except KeyError:
+                needed_by = self.refs[referenced] = set()
+            needed_by.add(key)
+        # Discard references satisfied by the new key
+        self.add_key(key)
+
+    def get_unsatisfied_refs(self):
+        return self.refs.iterkeys()
+
+    def add_key(self, key):
+        try:
+            del self.refs[key]
+        except KeyError:
+            # No keys depended on this key.  That's ok.
+            pass
+
+    def add_keys(self, keys):
+        for key in keys:
+            self.add_key(key)
+
+    def get_referrers(self):
+        result = set()
+        for referrers in self.refs.itervalues():
+            result.update(referrers)
+        return result
+
+
 class _KnitGraphIndex(object):
     """A KnitVersionedFiles index layered on GraphIndex."""
 
@@ -2723,9 +2761,9 @@
         self._is_locked = is_locked
         self._missing_compression_parents = set()
         if track_external_parent_refs:
-            self._external_parent_refs = set()
+            self._key_dependencies = _KeyRefs()
         else:
-            self._external_parent_refs = None
+            self._key_dependencies = None
 
     def __repr__(self):
         return "%s(%r)" % (self.__class__.__name__, self._graph_index)
@@ -2755,13 +2793,12 @@
 
         keys = {}
         compression_parents = set()
-        parent_refs = self._external_parent_refs
+        key_dependencies = self._key_dependencies
         for (key, options, access_memo, parents) in records:
             if self._parents:
                 parents = tuple(parents)
-                if parent_refs is not None:
-                    parent_refs.update(parents)
-                    parent_refs.discard(key)
+                if key_dependencies is not None:
+                    key_dependencies.add_references(key, parents)
             index, pos, size = access_memo
             if 'no-eol' in options:
                 value = 'N'
@@ -2829,12 +2866,11 @@
             new_missing = graph_index.external_references(ref_list_num=1)
             new_missing.difference_update(self.get_parent_map(new_missing))
             self._missing_compression_parents.update(new_missing)
-        if self._external_parent_refs is not None:
+        if self._key_dependencies is not None:
             # Add parent refs from graph_index (and discard parent refs that
             # the graph_index has).
             for node in graph_index.iter_all_entries():
-                self._external_parent_refs.update(node[3][0])
-                self._external_parent_refs.discard(node[1])
+                self._key_dependencies.add_references(node[1], node[3][0])
 
     def get_missing_compression_parents(self):
         """Return the keys of missing compression parents.
@@ -2847,9 +2883,9 @@
     def get_missing_parents(self):
         """Return the keys of missing parents."""
         # We may have false positives, so filter those out.
-        self._external_parent_refs.difference_update(
-            self.get_parent_map(self._external_parent_refs))
-        return frozenset(self._external_parent_refs)
+        self._key_dependencies.add_keys(
+            self.get_parent_map(self._key_dependencies.get_unsatisfied_refs()))
+        return frozenset(self._key_dependencies.get_unsatisfied_refs())
 
     def _check_read(self):
         """raise if reads are not permitted."""

=== modified file 'bzrlib/repository.py'
--- a/bzrlib/repository.py	2009-05-07 05:08:46 +0000
+++ b/bzrlib/repository.py	2009-05-12 04:54:04 +0000
@@ -1448,7 +1448,31 @@
         unstacked_inventories = self.inventories._index
         present_inventories = unstacked_inventories.get_parent_map(
             key[-1:] for key in parents)
-        parents.difference_update(present_inventories)
+        if len(parents.difference(present_inventories)) == 0:
+            # No missing parent inventories.
+            return set()
+        # Ok, now we have a list of missing inventories.  But these only matter
+        # if the inventories that reference them are missing some texts they
+        # appear to introduce.
+        # XXX: Texts referenced by all added inventories need to be present,
+        # but at the moment we're only checking for texts referenced by
+        # inventories at the graph's edge.
+        key_deps = self.revisions._index._key_dependencies
+        key_deps.add_keys(present_inventories)
+        referrers = frozenset(r[0] for r in key_deps.get_referrers())
+        file_ids = self.fileids_altered_by_revision_ids(referrers)
+        missing_texts = set()
+        for file_id, version_ids in file_ids.iteritems():
+            missing_texts.update(
+                (file_id, version_id) for version_id in version_ids)
+        present_texts = self.texts.get_parent_map(missing_texts)
+        missing_texts.difference_update(present_texts)
+        if not missing_texts:
+            # No texts are missing, so all revisions and their deltas are
+            # reconstructable.
+            return set()
+        # Alternatively the text versions could be returned as the missing
+        # keys, but this is likely to be less data.
         missing_keys = set(('inventories', rev_id) for (rev_id,) in parents)
         return missing_keys
 
@@ -3993,6 +4017,7 @@
     def _locked_insert_stream(self, stream, src_format):
         to_serializer = self.target_repo._format._serializer
         src_serializer = src_format._serializer
+        new_pack = None
         if to_serializer == src_serializer:
             # If serializers match and the target is a pack repository, set the
             # write cache size on the new pack.  This avoids poor performance
@@ -4039,6 +4064,11 @@
                 self.target_repo.signatures.insert_record_stream(substream)
             else:
                 raise AssertionError('kaboom! %s' % (substream_type,))
+        # Done inserting data, and the missing_keys calculations will try to
+        # read back from the inserted data, so flush the writes to the new pack
+        # (if this is pack format).
+        if new_pack is not None:
+            new_pack._write_data('', flush=True)
         # Find all the new revisions (including ones from resume_tokens)
         missing_keys = self.target_repo.get_missing_parent_inventories()
         try:

=== modified file 'bzrlib/tests/per_repository/test_write_group.py'
--- a/bzrlib/tests/per_repository/test_write_group.py	2009-04-21 07:22:04 +0000
+++ b/bzrlib/tests/per_repository/test_write_group.py	2009-05-12 01:36:08 +0000
@@ -120,6 +120,8 @@
         if token is not None:
             repo.leave_lock_in_place()
 
+class TestGetMissingParentInventories(TestCaseWithRepository):
+
     def test_empty_get_missing_parent_inventories(self):
         """A new write group has no missing parent inventories."""
         repo = self.make_repository('.')
@@ -131,63 +133,50 @@
             repo.commit_write_group()
             repo.unlock()
 
-    def test_get_missing_parent_inventories(self):
-        # Make a trunk with one commit.
-        if isinstance(self.repository_format, remote.RemoteRepositoryFormat):
-            # RemoteRepository by default builds a default format real
-            # repository, but the default format is unstackble.  So explicitly
-            # make a stackable real repository and use that.
-            repo = self.make_repository('trunk', format='1.9')
-            repo = bzrdir.BzrDir.open(self.get_url('trunk')).open_repository()
-        else:
-            repo = self.make_repository('trunk')
-        if not repo._format.supports_external_lookups:
-            raise TestNotApplicable('format not stackable')
-        repo.bzrdir._format.set_branch_format(BzrBranchFormat7())
+    def branch_trunk_and_make_tree(self, trunk_repo, relpath):
+        tree = self.make_branch_and_memory_tree('branch')
+        trunk_repo.lock_read()
+        self.addCleanup(trunk_repo.unlock)
+        tree.branch.repository.fetch(trunk_repo, revision_id='rev-1')
+        tree.set_parent_ids(['rev-1'])
+        return tree 
+
+    def make_first_commit(self, repo):
         trunk = repo.bzrdir.create_branch()
-        trunk_repo = repo
         tree = memorytree.MemoryTree.create_on_branch(trunk)
         tree.lock_write()
-        if repo._format.rich_root_data:
-            # The tree needs a root
-            tree._inventory.add(InventoryDirectory('the-root-id', '', None))
+        tree.add([''], ['TREE_ROOT'], ['directory'])
+        tree.add(['dir'], ['dir-id'], ['directory'])
+        tree.add(['filename'], ['file-id'], ['file'])
+        tree.put_file_bytes_non_atomic('file-id', 'content\n')
+        tree.commit('Trunk commit', rev_id='rev-0')
         tree.commit('Trunk commit', rev_id='rev-1')
         tree.unlock()
-        # Branch the trunk, add a new commit.
-        tree = self.make_branch_and_tree('branch')
-        trunk_repo.lock_read()
-        self.addCleanup(trunk_repo.unlock)
-        tree.branch.repository.fetch(trunk_repo, revision_id='rev-1')
-        tree.set_parent_ids(['rev-1'])
+
+    def make_new_commit_in_new_repo(self, trunk_repo, parents=None):
+        tree = self.branch_trunk_and_make_tree(trunk_repo, 'branch')
+        tree.set_parent_ids(parents)
         tree.commit('Branch commit', rev_id='rev-2')
         branch_repo = tree.branch.repository
-        # Make a new repo stacked on trunk, and copy the new commit's revision
-        # and inventory records to it.
-        if isinstance(self.repository_format, remote.RemoteRepositoryFormat):
-            # RemoteRepository by default builds a default format real
-            # repository, but the default format is unstackble.  So explicitly
-            # make a stackable real repository and use that.
-            repo = self.make_repository('stacked', format='1.9')
-            repo = bzrdir.BzrDir.open(self.get_url('stacked')).open_repository()
-        else:
-            repo = self.make_repository('stacked')
         branch_repo.lock_read()
         self.addCleanup(branch_repo.unlock)
-        repo.add_fallback_repository(trunk.repository)
-        repo.lock_write()
-        repo.start_write_group()
-        trunk_repo.lock_read()
-        repo.inventories.insert_record_stream(
-            branch_repo.inventories.get_record_stream(
-                [('rev-2',)], 'unordered', False))
-        repo.revisions.insert_record_stream(
-            branch_repo.revisions.get_record_stream(
-                [('rev-2',)], 'unordered', False))
-        self.assertEqual(
-            set([('inventories', 'rev-1')]),
-            repo.get_missing_parent_inventories())
-        # Revisions from resumed write groups can also cause missing parent
-        # inventories.
+        return branch_repo
+
+    def make_stackable_repo(self, relpath='trunk'):
+        if isinstance(self.repository_format, remote.RemoteRepositoryFormat):
+            # RemoteRepository by default builds a default format real
+            # repository, but the default format is unstackble.  So explicitly
+            # make a stackable real repository and use that.
+            repo = self.make_repository(relpath, format='1.9')
+            repo = bzrdir.BzrDir.open(self.get_url(relpath)).open_repository()
+        else:
+            repo = self.make_repository(relpath)
+        if not repo._format.supports_external_lookups:
+            raise TestNotApplicable('format not stackable')
+        repo.bzrdir._format.set_branch_format(BzrBranchFormat7())
+        return repo
+
+    def reopen_repo_and_resume_write_group(self, repo):
         try:
             resume_tokens = repo.suspend_write_group()
         except errors.UnsuspendableWriteGroup:
@@ -201,9 +190,93 @@
         reopened_repo.lock_write()
         self.addCleanup(reopened_repo.unlock)
         reopened_repo.resume_write_group(resume_tokens)
+        return reopened_repo
+
+    def test_ghost_revision(self):
+        """A parent inventory may be absent if all the needed texts are present.
+        i.e., a ghost revision isn't (necessarily) considered to be a missing
+        parent inventory.
+        """
+        # Make a trunk with one commit.
+        trunk_repo = self.make_stackable_repo()
+        self.make_first_commit(trunk_repo)
+        trunk_repo.lock_read()
+        self.addCleanup(trunk_repo.unlock)
+        # Branch the trunk, add a new commit.
+        branch_repo = self.make_new_commit_in_new_repo(
+            trunk_repo, parents=['rev-1', 'ghost-rev'])
+        inv = branch_repo.get_inventory('rev-2')
+        # Make a new repo stacked on trunk, and then copy into it:
+        #  - all texts in rev-2
+        #  - the new inventory (rev-2)
+        #  - the new revision (rev-2)
+        repo = self.make_stackable_repo('stacked')
+        repo.lock_write()
+        repo.start_write_group()
+        # Add all texts from in rev-2 inventory.  Note that this has to exclude
+        # the root if the repo format does not support rich roots.
+        rich_root = branch_repo._format.rich_root_data
+        all_texts = [
+            (ie.file_id, ie.revision) for ie in inv.iter_just_entries()
+             if rich_root or inv.id2path(ie.file_id) != '']
+        repo.texts.insert_record_stream(
+            branch_repo.texts.get_record_stream(all_texts, 'unordered', False))
+        # Add inventory and revision for rev-2.
+        repo.add_inventory('rev-2', inv, ['rev-1', 'ghost-rev'])
+        repo.revisions.insert_record_stream(
+            branch_repo.revisions.get_record_stream(
+                [('rev-2',)], 'unordered', False))
+        # Now, no inventories are reported as missing, even though there is a
+        # ghost.
+        self.assertEqual(set(), repo.get_missing_parent_inventories())
+        # Resuming the write group does not affect
+        # get_missing_parent_inventories.
+        reopened_repo = self.reopen_repo_and_resume_write_group(repo)
+        self.assertEqual(set(), reopened_repo.get_missing_parent_inventories())
+        reopened_repo.abort_write_group()
+
+    def test_get_missing_parent_inventories(self):
+        """A stacked repo with a single revision and inventory (no parent
+        inventory) in it must have all the texts in its inventory (even if not
+        changed w.r.t. to the absent parent), otherwise it will report missing
+        texts/parent inventory.
+        
+        The core of this test is that a file was changed in rev-1, but in a
+        stacked repo that only has rev-2 
+        """
+        # Make a trunk with one commit.
+        trunk_repo = self.make_stackable_repo()
+        self.make_first_commit(trunk_repo)
+        trunk_repo.lock_read()
+        self.addCleanup(trunk_repo.unlock)
+        # Branch the trunk, add a new commit.
+        branch_repo = self.make_new_commit_in_new_repo(
+            trunk_repo, parents=['rev-1'])
+        inv = branch_repo.get_inventory('rev-2')
+        # Make a new repo stacked on trunk, and copy the new commit's revision
+        # and inventory records to it.
+        repo = self.make_stackable_repo('stacked')
+        repo.lock_write()
+        repo.start_write_group()
+        # Insert a single fulltext inv (using add_inventory because it's
+        # simpler than insert_record_stream)
+        repo.add_inventory('rev-2', inv, ['rev-1'])
+        repo.revisions.insert_record_stream(
+            branch_repo.revisions.get_record_stream(
+                [('rev-2',)], 'unordered', False))
+        # There should be no missing compression parents
+        self.assertEqual(set(),
+                repo.inventories.get_missing_compression_parent_keys())
+        self.assertEqual(
+            set([('inventories', 'rev-1')]),
+            repo.get_missing_parent_inventories())
+        # Resuming the write group does not affect
+        # get_missing_parent_inventories.
+        reopened_repo = self.reopen_repo_and_resume_write_group(repo)
         self.assertEqual(
             set([('inventories', 'rev-1')]),
             reopened_repo.get_missing_parent_inventories())
+        # Adding the parent inventory satisfies get_missing_parent_inventories.
         reopened_repo.inventories.insert_record_stream(
             branch_repo.inventories.get_record_stream(
                 [('rev-1',)], 'unordered', False))




More information about the bazaar-commits mailing list