[MERGE] Update to bzr.dev.
Andrew Bennetts
andrew at canonical.com
Thu Jun 12 08:22:32 BST 2008
Some more review comments:
Robert Collins wrote:
[...]
> === modified file 'bzrlib/branch.py'
> --- bzrlib/branch.py 2008-06-07 22:15:25 +0000
> +++ bzrlib/branch.py 2008-06-11 07:22:00 +0000
> @@ -1465,6 +1465,9 @@
> """See Branch.set_revision_history."""
> if 'evil' in debug.debug_flags:
> mutter_callsite(3, "set_revision_history scales with history.")
> + check_not_reserved_id = _mod_revision.check_not_reserved_id
> + for rev_id in rev_history:
> + check_not_reserved_id(rev_id)
> self._write_revision_history(rev_history)
> self._clear_cached_state()
> self._cache_revision_history(rev_history)
> @@ -1528,9 +1531,11 @@
> raise errors.NoSuchRevision(self, revision_id)
> current_rev_id = revision_id
> new_history = []
> + check_not_reserved_id = _mod_revision.check_not_reserved_id
> # Do not include ghosts or graph origin in revision_history
> while (current_rev_id in parents_map and
> len(parents_map[current_rev_id]) > 0):
> + check_not_reserved_id(current_rev_id)
> new_history.append(current_rev_id)
> current_rev_id = parents_map[current_rev_id][0]
> parents_map = graph.get_parent_map([current_rev_id])
These changes make sense. It suggests we should have direct per_branch tests
for set_revision_history and maybe also generate_revision_history to verify that
they will reject reserved revision IDs, but I don't see any.
> === modified file 'bzrlib/bundle/serializer/v4.py'
[...]
> def write_files(self):
> """Write bundle records for all revisions of all files"""
> - for vf, file_id, revision_ids in self.iter_file_revisions():
> - self.add_mp_records('file', file_id, vf, revision_ids)
> + texts = self.repository.texts
> + text_keys = []
> + for file_id, revision_ids in \
> + self.repository.fileids_altered_by_revision_ids(
> + self.revision_ids).iteritems():
> + revision_ids = list(revision_ids)
> + for revision_id in revision_ids:
> + text_keys.append((file_id, revision_id))
> + self.add_mp_records_keys('file', texts, text_keys)
This is a little confusing. I don't see any reason why you list()ify
revision_ids before iterating it.
Also, a minor source of confusion here is that this method uses both
“self.revision_ids” and “revision_ids” in the same 3-line statement. Perhaps
this would be a little clearer:
fileids_altered = self.repository.fileids_altered_by_revision_ids(
self.revision_ids)
for file_id, revision_ids in fileids_altered.iteritems():
...
[...]
> + def add_mp_records_keys(self, repo_kind, vf, keys):
> """Add multi-parent diff records to a bundle"""
> - revision_ids = list(multiparent.topo_iter(vf, revision_ids))
> - mpdiffs = vf.make_mpdiffs(revision_ids)
> - sha1s = vf.get_sha1s(revision_ids)
> - parent_map = vf.get_parent_map(revision_ids)
> - for mpdiff, revision_id, sha1, in zip(mpdiffs, revision_ids, sha1s):
> - parents = parent_map[revision_id]
> + ordered_keys = list(multiparent.topo_iter_keys(vf, keys))
> + mpdiffs = vf.make_mpdiffs(ordered_keys)
> + sha1s = vf.get_sha1s(ordered_keys)
> + parent_map = vf.get_parent_map(ordered_keys)
> + for mpdiff, item_key, sha1, in zip(mpdiffs, ordered_keys, sha1s):
> + parents = [key[-1] for key in parent_map[item_key]]
> text = ''.join(mpdiff.to_patch())
> + # Infer file id records as appropriate.
> + if len(item_key) == 2:
> + file_id = item_key[0]
> + else:
> + file_id = None
> self.bundle.add_multiparent_record(text, sha1, parents, repo_kind,
> - revision_id, file_id)
> + item_key[-1], file_id)
It's a bit of a wart that (revid,) and similar is needed everywhere. For
example changing “revision_id” to “item_key[-1]” definitely seems like a step
backwards for clarity. Mostly the changes have tended to make the code more
readable (e.g. the new “texts” attribute rather than weave_store and friends),
but this aspect of the patch is making me cringe a little bit every time. Maybe
the multi-element keys have spread through more of the API than they should
have? It feels like an implementation detail of a low level that has leaked
into a few more layers than it needed to.
Signalling the type of a record by the number of elements in the key also feels
a bit dirty to me.
> === modified file 'bzrlib/bzrdir.py'
[...]
> - _revision_store = TextRevisionStore(TextStore(revision_transport,
> - prefixed=False,
> - compressed=True))
> + from bzrlib.xml5 import serializer_v5
> + revision_store = RevisionTextStore(revision_transport,
> + serializer_v5, False, versionedfile.PrefixMapper(),
> + lambda:True, lambda:True)
> try:
> - transaction = WriteTransaction()
> for i, rev_id in enumerate(self.converted_revs):
> self.pb.update('write revision', i, len(self.converted_revs))
> - _revision_store.add_revision(self.revisions[rev_id], transaction)
> + text = serializer_v5.write_revision_to_string(
> + self.revisions[rev_id])
> + key = (rev_id,)
> + revision_store.add_lines(key, None, osutils.split_lines(text))
It seems a bit odd that you pass the serializer to the revision_store, but then
you still need to serialize it manually before giving it to the revision_store.
> === modified file 'bzrlib/check.py'
[...]
> - inventoried = set(self.inventory_weave.versions())
> + inventoried = set(key[-1] for key in self.inventory_weave.keys())
I'd really *much* rather avoid magic numbers like -1 that assume the reader (and
writer!) correctly knows the structure of random keys. I understand it can be
necessary in code like dirstate for performance reasons, but the scope of the
code and data involved in dirstate is basically confined to a single file.
Here it's all over bzrlib.
A small improvement in this particular case would be to rename “inventoried” to
“inventoried_revisions” to make it clear what sort of thing we're putting into
the set despite the -1. But my preferred solution would be to get rid of the
indexing completely.
[...]
> + result = weave_checker.check_file_version_parents(
> + self.repository.texts, progress_bar=self.progress)
> + self.checked_weaves = weave_checker.file_ids
> + bad_parents, unused_versions = result
> + bad_parents = bad_parents.items()
> + for text_key, (stored_parents, correct_parents) in bad_parents:
> + # XXX not ready for id join/split operations.
> + weave_id = text_key[0]
> + revision_id = text_key[-1]
For instance, here I'm left wondering why not [0] and [1]. Or why not
“weave_id, revision_id = text_key”. What might be between [0] and [-1], and why
is it ok to ignore here?
Basically, these tuples where we used to just have scalar revision IDs are a
constant low-level irritation when reading this patch :(
> === modified file 'bzrlib/fetch.py'
[...]
> - except errors.NoSuchRevision:
> + except errors.NoSuchRevision, e:
> + import pdb;pdb.set_trace()
You probably don't need this line ;)
[...]
> + def yield_roots():
> + for root_id, rev_id in root_id_order:
> + key = (root_id, rev_id)
It probably doesn't make a real difference, but it's slightly wasteful to unpack
a tuple and then immediately make an identical tuple.
-Andrew.
More information about the bazaar
mailing list