[MERGE] Update to bzr.dev.

Thu Jun 12 08:22:32 BST 2008

Some more review comments:

Robert Collins wrote:
[...]
> === modified file 'bzrlib/branch.py'
> --- bzrlib/branch.py	2008-06-07 22:15:25 +0000
> +++ bzrlib/branch.py	2008-06-11 07:22:00 +0000
> @@ -1465,6 +1465,9 @@
>          """See Branch.set_revision_history."""
>          if 'evil' in debug.debug_flags:
>              mutter_callsite(3, "set_revision_history scales with history.")
> +        check_not_reserved_id = _mod_revision.check_not_reserved_id
> +        for rev_id in rev_history:
> +            check_not_reserved_id(rev_id)
>          self._write_revision_history(rev_history)
>          self._clear_cached_state()
>          self._cache_revision_history(rev_history)
> @@ -1528,9 +1531,11 @@
>              raise errors.NoSuchRevision(self, revision_id)
>          current_rev_id = revision_id
>          new_history = []
> +        check_not_reserved_id = _mod_revision.check_not_reserved_id
>          # Do not include ghosts or graph origin in revision_history
>          while (current_rev_id in parents_map and
>                 len(parents_map[current_rev_id]) > 0):
> +            check_not_reserved_id(current_rev_id)
>              new_history.append(current_rev_id)
>              current_rev_id = parents_map[current_rev_id][0]
>              parents_map = graph.get_parent_map([current_rev_id])

These changes make sense.  It suggests we should have direct per_branch tests
for set_revision_history and maybe also generate_revision_history to verify that
they will reject reserved revision IDs, but I don't see any.

> === modified file 'bzrlib/bundle/serializer/v4.py'
[...]
>      def write_files(self):
>          """Write bundle records for all revisions of all files"""
> -        for vf, file_id, revision_ids in self.iter_file_revisions():
> -            self.add_mp_records('file', file_id, vf, revision_ids)
> +        texts = self.repository.texts
> +        text_keys = []
> +        for file_id, revision_ids in \
> +            self.repository.fileids_altered_by_revision_ids(
> +                self.revision_ids).iteritems():
> +            revision_ids = list(revision_ids)
> +            for revision_id in revision_ids:
> +                text_keys.append((file_id, revision_id))
> +        self.add_mp_records_keys('file', texts, text_keys)

This is a little confusing.  I don't see any reason why you list()ify
revision_ids before iterating it.

Also, a minor source of confusion here is that this method uses both
“self.revision_ids” and “revision_ids” in the same 3-line statement.  Perhaps
this would be a little clearer:

        fileids_altered = self.repository.fileids_altered_by_revision_ids(
            self.revision_ids)
        for file_id, revision_ids in fileids_altered.iteritems():
            ...

[...]
> +    def add_mp_records_keys(self, repo_kind, vf, keys):
>          """Add multi-parent diff records to a bundle"""
> -        revision_ids = list(multiparent.topo_iter(vf, revision_ids))
> -        mpdiffs = vf.make_mpdiffs(revision_ids)
> -        sha1s = vf.get_sha1s(revision_ids)
> -        parent_map = vf.get_parent_map(revision_ids)
> -        for mpdiff, revision_id, sha1, in zip(mpdiffs, revision_ids, sha1s):
> -            parents = parent_map[revision_id]
> +        ordered_keys = list(multiparent.topo_iter_keys(vf, keys))
> +        mpdiffs = vf.make_mpdiffs(ordered_keys)
> +        sha1s = vf.get_sha1s(ordered_keys)
> +        parent_map = vf.get_parent_map(ordered_keys)
> +        for mpdiff, item_key, sha1, in zip(mpdiffs, ordered_keys, sha1s):
> +            parents = [key[-1] for key in parent_map[item_key]]
>              text = ''.join(mpdiff.to_patch())
> +            # Infer file id records as appropriate.
> +            if len(item_key) == 2:
> +                file_id = item_key[0]
> +            else:
> +                file_id = None
>              self.bundle.add_multiparent_record(text, sha1, parents, repo_kind,
> -                                               revision_id, file_id)
> +                                               item_key[-1], file_id)

It's a bit of a wart that (revid,) and similar is needed everywhere.  For
example changing “revision_id” to “item_key[-1]” definitely seems like a step
backwards for clarity.  Mostly the changes have tended to make the code more
readable (e.g.  the new “texts” attribute rather than weave_store and friends),
but this aspect of the patch is making me cringe a little bit every time.  Maybe
the multi-element keys have spread through more of the API than they should
have?  It feels like an implementation detail of a low level that has leaked
into a few more layers than it needed to.

Signalling the type of a record by the number of elements in the key also feels
a bit dirty to me.

> === modified file 'bzrlib/bzrdir.py'
[...]
> -        _revision_store = TextRevisionStore(TextStore(revision_transport,
> -                                                      prefixed=False,
> -                                                      compressed=True))
> +        from bzrlib.xml5 import serializer_v5
> +        revision_store = RevisionTextStore(revision_transport,
> +            serializer_v5, False, versionedfile.PrefixMapper(),
> +            lambda:True, lambda:True)
>          try:
> -            transaction = WriteTransaction()
>              for i, rev_id in enumerate(self.converted_revs):
>                  self.pb.update('write revision', i, len(self.converted_revs))
> -                _revision_store.add_revision(self.revisions[rev_id], transaction)
> +                text = serializer_v5.write_revision_to_string(
> +                    self.revisions[rev_id])
> +                key = (rev_id,)
> +                revision_store.add_lines(key, None, osutils.split_lines(text))

It seems a bit odd that you pass the serializer to the revision_store, but then
you still need to serialize it manually before giving it to the revision_store.

> === modified file 'bzrlib/check.py'
[...]
> -        inventoried = set(self.inventory_weave.versions())
> +        inventoried = set(key[-1] for key in self.inventory_weave.keys())

I'd really *much* rather avoid magic numbers like -1 that assume the reader (and
writer!) correctly knows the structure of random keys.  I understand it can be
necessary in code like dirstate for performance reasons, but the scope of the
code and data involved in dirstate is basically confined to a single file.
Here it's all over bzrlib.

A small improvement in this particular case would be to rename “inventoried” to
“inventoried_revisions” to make it clear what sort of thing we're putting into
the set despite the -1.  But my preferred solution would be to get rid of the
indexing completely.

[...]
> +        result = weave_checker.check_file_version_parents(
> +            self.repository.texts, progress_bar=self.progress)
> +        self.checked_weaves = weave_checker.file_ids
> +        bad_parents, unused_versions = result
> +        bad_parents = bad_parents.items()
> +        for text_key, (stored_parents, correct_parents) in bad_parents:
> +            # XXX not ready for id join/split operations.
> +            weave_id = text_key[0]
> +            revision_id = text_key[-1]

For instance, here I'm left wondering why not [0] and [1].  Or why not
“weave_id, revision_id = text_key”.  What might be between [0] and [-1], and why
is it ok to ignore here?

Basically, these tuples where we used to just have scalar revision IDs are a
constant low-level irritation when reading this patch :(

> === modified file 'bzrlib/fetch.py'
[...]
> -        except errors.NoSuchRevision:
> +        except errors.NoSuchRevision, e:
> +            import pdb;pdb.set_trace()

You probably don't need this line ;)

[...]
> +        def yield_roots():
> +            for root_id, rev_id in root_id_order:
> +                key = (root_id, rev_id)

It probably doesn't make a real difference, but it's slightly wasteful to unpack
a tuple and then immediately make an identical tuple.

-Andrew.