Rev 4353: (andrew) Pass missing parent inventories check if all referenced texts are present in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Tue May 12 05:54:15 BST 2009
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 4353
revision-id: pqm at pqm.ubuntu.com-20090512045404-3gjp1vrdmrjpwjgh
parent: pqm at pqm.ubuntu.com-20090512034718-8q4bonu5vryuwg45
parent: andrew.bennetts at canonical.com-20090512013608-0rkuiorp6zm2ikps
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Tue 2009-05-12 05:54:04 +0100
message:
(andrew) Pass missing parent inventories check if all referenced texts are present
modified:
bzrlib/knit.py knit.py-20051212171256-f056ac8f0fbe1bd9
bzrlib/repository.py rev_storage.py-20051111201905-119e9401e46257e3
bzrlib/tests/per_repository/test_write_group.py test_write_group.py-20070716105516-89n34xtogq5frn0m-1
------------------------------------------------------------
revno: 4309.1.8
revision-id: andrew.bennetts at canonical.com-20090512013608-0rkuiorp6zm2ikps
parent: andrew.bennetts at canonical.com-20090511224622-yvbt3zhruutamyvt
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: all-referenced-texts-check
timestamp: Tue 2009-05-12 11:36:08 +1000
message:
Remove old TODO.
modified:
bzrlib/tests/per_repository/test_write_group.py test_write_group.py-20070716105516-89n34xtogq5frn0m-1
------------------------------------------------------------
revno: 4309.1.7
revision-id: andrew.bennetts at canonical.com-20090511224622-yvbt3zhruutamyvt
parent: andrew.bennetts at canonical.com-20090511223630-crp83zx31froq240
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: all-referenced-texts-check
timestamp: Tue 2009-05-12 08:46:22 +1000
message:
Fix bug found by acceptance test: we need to flush writes (if we are buffering them) before trying to determine the missing_keys in _locked_insert_stream.
modified:
bzrlib/repository.py rev_storage.py-20051111201905-119e9401e46257e3
------------------------------------------------------------
revno: 4309.1.6
revision-id: andrew.bennetts at canonical.com-20090511223630-crp83zx31froq240
parent: andrew.bennetts at canonical.com-20090511221427-s78n13gej60hqr2y
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: all-referenced-texts-check
timestamp: Tue 2009-05-12 08:36:30 +1000
message:
Exit get_missing_parent_inventories early (without checking texts) if there are no missing parent inventories.
modified:
bzrlib/repository.py rev_storage.py-20051111201905-119e9401e46257e3
bzrlib/tests/per_repository/test_write_group.py test_write_group.py-20070716105516-89n34xtogq5frn0m-1
------------------------------------------------------------
revno: 4309.1.5
revision-id: andrew.bennetts at canonical.com-20090511221427-s78n13gej60hqr2y
parent: andrew.bennetts at canonical.com-20090511083602-rbnn9438cyc0q13g
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: all-referenced-texts-check
timestamp: Tue 2009-05-12 08:14:27 +1000
message:
Remove lots of cruft.
modified:
bzrlib/repository.py rev_storage.py-20051111201905-119e9401e46257e3
bzrlib/tests/per_repository/test_write_group.py test_write_group.py-20070716105516-89n34xtogq5frn0m-1
------------------------------------------------------------
revno: 4309.1.4
revision-id: andrew.bennetts at canonical.com-20090511083602-rbnn9438cyc0q13g
parent: andrew.bennetts at canonical.com-20090511083152-o8bslc85cqhdqd04
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: all-referenced-texts-check
timestamp: Mon 2009-05-11 18:36:02 +1000
message:
Remove some cruft.
modified:
bzrlib/tests/per_repository/test_write_group.py test_write_group.py-20070716105516-89n34xtogq5frn0m-1
------------------------------------------------------------
revno: 4309.1.3
revision-id: andrew.bennetts at canonical.com-20090511083152-o8bslc85cqhdqd04
parent: andrew.bennetts at canonical.com-20090429095057-k8pa6looyziusuhn
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: all-referenced-texts-check
timestamp: Mon 2009-05-11 18:31:52 +1000
message:
Start testing more cases, and start factoring those tests a little more clearly.
modified:
bzrlib/repository.py rev_storage.py-20051111201905-119e9401e46257e3
bzrlib/tests/per_repository/test_write_group.py test_write_group.py-20070716105516-89n34xtogq5frn0m-1
------------------------------------------------------------
revno: 4309.1.2
revision-id: andrew.bennetts at canonical.com-20090429095057-k8pa6looyziusuhn
parent: andrew.bennetts at canonical.com-20090429083826-qd48gnjw9pgj4c63
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: all-referenced-texts-check
timestamp: Wed 2009-04-29 19:50:57 +1000
message:
Tentative fix for bug 368418: only fail the missing parent inventories check if there are missing texts that appear to be altered by the inventories with missing parents.
modified:
bzrlib/knit.py knit.py-20051212171256-f056ac8f0fbe1bd9
bzrlib/repository.py rev_storage.py-20051111201905-119e9401e46257e3
------------------------------------------------------------
revno: 4309.1.1
revision-id: andrew.bennetts at canonical.com-20090429083826-qd48gnjw9pgj4c63
parent: pqm at pqm.ubuntu.com-20090428170537-tbjh8ng5lrws1er9
committer: Andrew Bennetts <andrew.bennetts at canonical.com>
branch nick: all-referenced-texts-check
timestamp: Wed 2009-04-29 18:38:26 +1000
message:
Track which keys referenced the missing parents.
modified:
bzrlib/knit.py knit.py-20051212171256-f056ac8f0fbe1bd9
=== modified file 'bzrlib/knit.py'
--- a/bzrlib/knit.py 2009-04-27 23:14:00 +0000
+++ b/bzrlib/knit.py 2009-04-29 09:50:57 +0000
@@ -2688,6 +2688,44 @@
return key[:-1], key[-1]
+class _KeyRefs(object):
+
+ def __init__(self):
+ # dict mapping 'key' to 'set of keys referring to that key'
+ self.refs = {}
+
+ def add_references(self, key, refs):
+ # Record the new references
+ for referenced in refs:
+ try:
+ needed_by = self.refs[referenced]
+ except KeyError:
+ needed_by = self.refs[referenced] = set()
+ needed_by.add(key)
+ # Discard references satisfied by the new key
+ self.add_key(key)
+
+ def get_unsatisfied_refs(self):
+ return self.refs.iterkeys()
+
+ def add_key(self, key):
+ try:
+ del self.refs[key]
+ except KeyError:
+ # No keys depended on this key. That's ok.
+ pass
+
+ def add_keys(self, keys):
+ for key in keys:
+ self.add_key(key)
+
+ def get_referrers(self):
+ result = set()
+ for referrers in self.refs.itervalues():
+ result.update(referrers)
+ return result
+
+
class _KnitGraphIndex(object):
"""A KnitVersionedFiles index layered on GraphIndex."""
@@ -2723,9 +2761,9 @@
self._is_locked = is_locked
self._missing_compression_parents = set()
if track_external_parent_refs:
- self._external_parent_refs = set()
+ self._key_dependencies = _KeyRefs()
else:
- self._external_parent_refs = None
+ self._key_dependencies = None
def __repr__(self):
return "%s(%r)" % (self.__class__.__name__, self._graph_index)
@@ -2755,13 +2793,12 @@
keys = {}
compression_parents = set()
- parent_refs = self._external_parent_refs
+ key_dependencies = self._key_dependencies
for (key, options, access_memo, parents) in records:
if self._parents:
parents = tuple(parents)
- if parent_refs is not None:
- parent_refs.update(parents)
- parent_refs.discard(key)
+ if key_dependencies is not None:
+ key_dependencies.add_references(key, parents)
index, pos, size = access_memo
if 'no-eol' in options:
value = 'N'
@@ -2829,12 +2866,11 @@
new_missing = graph_index.external_references(ref_list_num=1)
new_missing.difference_update(self.get_parent_map(new_missing))
self._missing_compression_parents.update(new_missing)
- if self._external_parent_refs is not None:
+ if self._key_dependencies is not None:
# Add parent refs from graph_index (and discard parent refs that
# the graph_index has).
for node in graph_index.iter_all_entries():
- self._external_parent_refs.update(node[3][0])
- self._external_parent_refs.discard(node[1])
+ self._key_dependencies.add_references(node[1], node[3][0])
def get_missing_compression_parents(self):
"""Return the keys of missing compression parents.
@@ -2847,9 +2883,9 @@
def get_missing_parents(self):
"""Return the keys of missing parents."""
# We may have false positives, so filter those out.
- self._external_parent_refs.difference_update(
- self.get_parent_map(self._external_parent_refs))
- return frozenset(self._external_parent_refs)
+ self._key_dependencies.add_keys(
+ self.get_parent_map(self._key_dependencies.get_unsatisfied_refs()))
+ return frozenset(self._key_dependencies.get_unsatisfied_refs())
def _check_read(self):
"""raise if reads are not permitted."""
=== modified file 'bzrlib/repository.py'
--- a/bzrlib/repository.py 2009-05-07 05:08:46 +0000
+++ b/bzrlib/repository.py 2009-05-12 04:54:04 +0000
@@ -1448,7 +1448,31 @@
unstacked_inventories = self.inventories._index
present_inventories = unstacked_inventories.get_parent_map(
key[-1:] for key in parents)
- parents.difference_update(present_inventories)
+ if len(parents.difference(present_inventories)) == 0:
+ # No missing parent inventories.
+ return set()
+ # Ok, now we have a list of missing inventories. But these only matter
+ # if the inventories that reference them are missing some texts they
+ # appear to introduce.
+ # XXX: Texts referenced by all added inventories need to be present,
+ # but at the moment we're only checking for texts referenced by
+ # inventories at the graph's edge.
+ key_deps = self.revisions._index._key_dependencies
+ key_deps.add_keys(present_inventories)
+ referrers = frozenset(r[0] for r in key_deps.get_referrers())
+ file_ids = self.fileids_altered_by_revision_ids(referrers)
+ missing_texts = set()
+ for file_id, version_ids in file_ids.iteritems():
+ missing_texts.update(
+ (file_id, version_id) for version_id in version_ids)
+ present_texts = self.texts.get_parent_map(missing_texts)
+ missing_texts.difference_update(present_texts)
+ if not missing_texts:
+ # No texts are missing, so all revisions and their deltas are
+ # reconstructable.
+ return set()
+ # Alternatively the text versions could be returned as the missing
+ # keys, but this is likely to be less data.
missing_keys = set(('inventories', rev_id) for (rev_id,) in parents)
return missing_keys
@@ -3993,6 +4017,7 @@
def _locked_insert_stream(self, stream, src_format):
to_serializer = self.target_repo._format._serializer
src_serializer = src_format._serializer
+ new_pack = None
if to_serializer == src_serializer:
# If serializers match and the target is a pack repository, set the
# write cache size on the new pack. This avoids poor performance
@@ -4039,6 +4064,11 @@
self.target_repo.signatures.insert_record_stream(substream)
else:
raise AssertionError('kaboom! %s' % (substream_type,))
+ # Done inserting data, and the missing_keys calculations will try to
+ # read back from the inserted data, so flush the writes to the new pack
+ # (if this is pack format).
+ if new_pack is not None:
+ new_pack._write_data('', flush=True)
# Find all the new revisions (including ones from resume_tokens)
missing_keys = self.target_repo.get_missing_parent_inventories()
try:
=== modified file 'bzrlib/tests/per_repository/test_write_group.py'
--- a/bzrlib/tests/per_repository/test_write_group.py 2009-04-21 07:22:04 +0000
+++ b/bzrlib/tests/per_repository/test_write_group.py 2009-05-12 01:36:08 +0000
@@ -120,6 +120,8 @@
if token is not None:
repo.leave_lock_in_place()
+class TestGetMissingParentInventories(TestCaseWithRepository):
+
def test_empty_get_missing_parent_inventories(self):
"""A new write group has no missing parent inventories."""
repo = self.make_repository('.')
@@ -131,63 +133,50 @@
repo.commit_write_group()
repo.unlock()
- def test_get_missing_parent_inventories(self):
- # Make a trunk with one commit.
- if isinstance(self.repository_format, remote.RemoteRepositoryFormat):
- # RemoteRepository by default builds a default format real
- # repository, but the default format is unstackble. So explicitly
- # make a stackable real repository and use that.
- repo = self.make_repository('trunk', format='1.9')
- repo = bzrdir.BzrDir.open(self.get_url('trunk')).open_repository()
- else:
- repo = self.make_repository('trunk')
- if not repo._format.supports_external_lookups:
- raise TestNotApplicable('format not stackable')
- repo.bzrdir._format.set_branch_format(BzrBranchFormat7())
+ def branch_trunk_and_make_tree(self, trunk_repo, relpath):
+ tree = self.make_branch_and_memory_tree('branch')
+ trunk_repo.lock_read()
+ self.addCleanup(trunk_repo.unlock)
+ tree.branch.repository.fetch(trunk_repo, revision_id='rev-1')
+ tree.set_parent_ids(['rev-1'])
+ return tree
+
+ def make_first_commit(self, repo):
trunk = repo.bzrdir.create_branch()
- trunk_repo = repo
tree = memorytree.MemoryTree.create_on_branch(trunk)
tree.lock_write()
- if repo._format.rich_root_data:
- # The tree needs a root
- tree._inventory.add(InventoryDirectory('the-root-id', '', None))
+ tree.add([''], ['TREE_ROOT'], ['directory'])
+ tree.add(['dir'], ['dir-id'], ['directory'])
+ tree.add(['filename'], ['file-id'], ['file'])
+ tree.put_file_bytes_non_atomic('file-id', 'content\n')
+ tree.commit('Trunk commit', rev_id='rev-0')
tree.commit('Trunk commit', rev_id='rev-1')
tree.unlock()
- # Branch the trunk, add a new commit.
- tree = self.make_branch_and_tree('branch')
- trunk_repo.lock_read()
- self.addCleanup(trunk_repo.unlock)
- tree.branch.repository.fetch(trunk_repo, revision_id='rev-1')
- tree.set_parent_ids(['rev-1'])
+
+ def make_new_commit_in_new_repo(self, trunk_repo, parents=None):
+ tree = self.branch_trunk_and_make_tree(trunk_repo, 'branch')
+ tree.set_parent_ids(parents)
tree.commit('Branch commit', rev_id='rev-2')
branch_repo = tree.branch.repository
- # Make a new repo stacked on trunk, and copy the new commit's revision
- # and inventory records to it.
- if isinstance(self.repository_format, remote.RemoteRepositoryFormat):
- # RemoteRepository by default builds a default format real
- # repository, but the default format is unstackble. So explicitly
- # make a stackable real repository and use that.
- repo = self.make_repository('stacked', format='1.9')
- repo = bzrdir.BzrDir.open(self.get_url('stacked')).open_repository()
- else:
- repo = self.make_repository('stacked')
branch_repo.lock_read()
self.addCleanup(branch_repo.unlock)
- repo.add_fallback_repository(trunk.repository)
- repo.lock_write()
- repo.start_write_group()
- trunk_repo.lock_read()
- repo.inventories.insert_record_stream(
- branch_repo.inventories.get_record_stream(
- [('rev-2',)], 'unordered', False))
- repo.revisions.insert_record_stream(
- branch_repo.revisions.get_record_stream(
- [('rev-2',)], 'unordered', False))
- self.assertEqual(
- set([('inventories', 'rev-1')]),
- repo.get_missing_parent_inventories())
- # Revisions from resumed write groups can also cause missing parent
- # inventories.
+ return branch_repo
+
+ def make_stackable_repo(self, relpath='trunk'):
+ if isinstance(self.repository_format, remote.RemoteRepositoryFormat):
+ # RemoteRepository by default builds a default format real
+ # repository, but the default format is unstackble. So explicitly
+ # make a stackable real repository and use that.
+ repo = self.make_repository(relpath, format='1.9')
+ repo = bzrdir.BzrDir.open(self.get_url(relpath)).open_repository()
+ else:
+ repo = self.make_repository(relpath)
+ if not repo._format.supports_external_lookups:
+ raise TestNotApplicable('format not stackable')
+ repo.bzrdir._format.set_branch_format(BzrBranchFormat7())
+ return repo
+
+ def reopen_repo_and_resume_write_group(self, repo):
try:
resume_tokens = repo.suspend_write_group()
except errors.UnsuspendableWriteGroup:
@@ -201,9 +190,93 @@
reopened_repo.lock_write()
self.addCleanup(reopened_repo.unlock)
reopened_repo.resume_write_group(resume_tokens)
+ return reopened_repo
+
+ def test_ghost_revision(self):
+ """A parent inventory may be absent if all the needed texts are present.
+ i.e., a ghost revision isn't (necessarily) considered to be a missing
+ parent inventory.
+ """
+ # Make a trunk with one commit.
+ trunk_repo = self.make_stackable_repo()
+ self.make_first_commit(trunk_repo)
+ trunk_repo.lock_read()
+ self.addCleanup(trunk_repo.unlock)
+ # Branch the trunk, add a new commit.
+ branch_repo = self.make_new_commit_in_new_repo(
+ trunk_repo, parents=['rev-1', 'ghost-rev'])
+ inv = branch_repo.get_inventory('rev-2')
+ # Make a new repo stacked on trunk, and then copy into it:
+ # - all texts in rev-2
+ # - the new inventory (rev-2)
+ # - the new revision (rev-2)
+ repo = self.make_stackable_repo('stacked')
+ repo.lock_write()
+ repo.start_write_group()
+ # Add all texts from in rev-2 inventory. Note that this has to exclude
+ # the root if the repo format does not support rich roots.
+ rich_root = branch_repo._format.rich_root_data
+ all_texts = [
+ (ie.file_id, ie.revision) for ie in inv.iter_just_entries()
+ if rich_root or inv.id2path(ie.file_id) != '']
+ repo.texts.insert_record_stream(
+ branch_repo.texts.get_record_stream(all_texts, 'unordered', False))
+ # Add inventory and revision for rev-2.
+ repo.add_inventory('rev-2', inv, ['rev-1', 'ghost-rev'])
+ repo.revisions.insert_record_stream(
+ branch_repo.revisions.get_record_stream(
+ [('rev-2',)], 'unordered', False))
+ # Now, no inventories are reported as missing, even though there is a
+ # ghost.
+ self.assertEqual(set(), repo.get_missing_parent_inventories())
+ # Resuming the write group does not affect
+ # get_missing_parent_inventories.
+ reopened_repo = self.reopen_repo_and_resume_write_group(repo)
+ self.assertEqual(set(), reopened_repo.get_missing_parent_inventories())
+ reopened_repo.abort_write_group()
+
+ def test_get_missing_parent_inventories(self):
+ """A stacked repo with a single revision and inventory (no parent
+ inventory) in it must have all the texts in its inventory (even if not
+ changed w.r.t. to the absent parent), otherwise it will report missing
+ texts/parent inventory.
+
+ The core of this test is that a file was changed in rev-1, but in a
+ stacked repo that only has rev-2
+ """
+ # Make a trunk with one commit.
+ trunk_repo = self.make_stackable_repo()
+ self.make_first_commit(trunk_repo)
+ trunk_repo.lock_read()
+ self.addCleanup(trunk_repo.unlock)
+ # Branch the trunk, add a new commit.
+ branch_repo = self.make_new_commit_in_new_repo(
+ trunk_repo, parents=['rev-1'])
+ inv = branch_repo.get_inventory('rev-2')
+ # Make a new repo stacked on trunk, and copy the new commit's revision
+ # and inventory records to it.
+ repo = self.make_stackable_repo('stacked')
+ repo.lock_write()
+ repo.start_write_group()
+ # Insert a single fulltext inv (using add_inventory because it's
+ # simpler than insert_record_stream)
+ repo.add_inventory('rev-2', inv, ['rev-1'])
+ repo.revisions.insert_record_stream(
+ branch_repo.revisions.get_record_stream(
+ [('rev-2',)], 'unordered', False))
+ # There should be no missing compression parents
+ self.assertEqual(set(),
+ repo.inventories.get_missing_compression_parent_keys())
+ self.assertEqual(
+ set([('inventories', 'rev-1')]),
+ repo.get_missing_parent_inventories())
+ # Resuming the write group does not affect
+ # get_missing_parent_inventories.
+ reopened_repo = self.reopen_repo_and_resume_write_group(repo)
self.assertEqual(
set([('inventories', 'rev-1')]),
reopened_repo.get_missing_parent_inventories())
+ # Adding the parent inventory satisfies get_missing_parent_inventories.
reopened_repo.inventories.insert_record_stream(
branch_repo.inventories.get_record_stream(
[('rev-1',)], 'unordered', False))
More information about the bazaar-commits
mailing list