[MERGE] Move serialisation of inventories into CommitBuilder

Andrew Bennetts andrew at canonical.com
Tue Oct 2 10:20:38 BST 2007


bb:tweak

This change seems reasonable to me.  I'm not deeply familiar with CommitBuilder
though, so if someone else wants to review the basic idea of this change that
would be great.

Further comments inline.

> === modified file 'bzrlib/commit.py'
> --- bzrlib/commit.py	2007-09-21 04:22:53 +0000
> +++ bzrlib/commit.py	2007-10-02 01:06:55 +0000
[...]
> +            content_summary = self.work_tree.path_content_summary(path)
> +            if not specific_files or is_inside_any(specific_files, path):
> +                if content_summary[0] == 'missing':
> +                    deleted_paths.add(path)
> +                    self.reporter.missing(path)
> +                    deleted_ids.append(file_id)
> +                    continue
> +            # TODO: have the builder do the nested commit just-in-time IF and
> +            # only if needed.
> +            if content_summary[0] == 'tree-reference':
> +                # enforce repository nested tree policy.
> +                if (not self.work_tree.supports_tree_reference() or

Is it really possible that we could have a tree-reference here if work_tree
doesn't support tree references?  That seems odd.  Can you explain to me how
that might happen?   (I'm guessing this seems odd to me because of my ignorance,
rather than because it's a problem.)

[...]
> === modified file 'bzrlib/inventory.py'
> --- bzrlib/inventory.py	2007-09-21 04:22:53 +0000
> +++ bzrlib/inventory.py	2007-09-21 06:10:31 +0000
[...]
>  
> -    def snapshot(self, revision, path, previous_entries,
> -                 work_tree, commit_builder):

Hooray for deleting code!

[...]
> === modified file 'bzrlib/memorytree.py'
> --- bzrlib/memorytree.py	2007-08-23 00:12:35 +0000
> +++ bzrlib/memorytree.py	2007-09-05 05:51:34 +0000
> @@ -98,6 +98,26 @@
>              return None, False, None
>          return entry.kind, entry.executable, None
>  
> +    def path_content_summary(self, path):
> +        """See Tree.path_content_summary."""
> +        id = self.path2id(path)
> +        if id is None:
> +            return 'missing', None, None, None
> +        kind = self.kind(id)
> +        if kind == 'file':
> +            bytes = self._file_transport.get_bytes(path)
> +            size = len(bytes)
> +            executable = self._inventory[id].executable

It doesn't make sense to me to get the executable bit from the inventory but the
file size from transport.  Shouldn't you get both from the same source?

[...]
> === modified file 'bzrlib/repository.py'
> --- bzrlib/repository.py	2007-09-25 03:36:29 +0000
> +++ bzrlib/repository.py	2007-09-27 21:11:38 +0000
> @@ -200,14 +200,15 @@
[...]
> +        :param content_summary: Summary data from the tree about the paths
> +            content - stat, length, exec, sha/link target. This is only
> +            accessed when the entry has a revision of None - that is when it is
> +            a candidate to commit.
>          :return: True if a new version of the entry has been recorded.
>              (Committing a merge where a file was only changed on the other side
>              will not return True.)
>          """
>          if self.new_inventory.root is None:
>              self._check_root(ie, parent_invs, tree)
> +        if ie.revision is None:
> +            kind = content_summary[0]

I do wonder if it would be a little clearer to unpack the content_summary tuple
rather than indexing it all the time.  You've done a good job of commenting when
necessary though so I'm not too worried about this.  It might be fractionally
quicker though ;)

> +        else:
> +            # ie is carried over from a prior commit
> +            kind = ie.kind
> +        # XXX: repository specific check for nested tree support goes here - if
> +        # the repo doesn't want nested trees we skip it ?
> +        if (kind == 'tree-reference' and
> +            not self.repository._format.supports_tree_reference):
> +            # mismatch between commit builder logic and repository:
> +            # this needs the entry creation pushed down into the builder.
> +            raise NotImplementedError

A brief message in this assert would be good, just to make sure we don't get
mysterious bug reports from users if they do trigger this somehow.

> +        # transitional assert only, will remove before release.
> +        assert ie.kind == kind

Is there a bug or something tracking this, to make sure we remove it before the
release?  Maybe stick a "or version != (0, 92)" in the condition ;)

> +                else:
> +                    # Either there is only a hash change(no hash cache entry,
> +                    # or same size content change), or there is no change on
> +                    # this file at all.
> +                    # There is a race condition when inserting content into the
> +                    # knit though that can result in different content being
> +                    # inserted so even though we may have had a hash cache hit
> +                    # here we still tell the store the hash we would *not*
> +                    # store a new text on, which means that it can avoid for us
> +                    # without a race condition and without double-shaing the
> +                    # lines.
> +                    nostore_sha = parent_entry.text_sha1

I don't quite follow that comment.  That sixty-four word sentence is just a bit
too convoluted for me to be able to detangle as I read it.  Could you please try
rephrasing it using shorter sentences?

> +            if store:
> +                nostore_sha = None
> +            try:
> +                ie.executable = content_summary[2]
> +                lines = tree.get_file(ie.file_id, path).readlines()
> +                ie.text_sha1, ie.text_size = self._add_text_to_weave(
> +                    ie.file_id, lines, heads, nostore_sha)
> +            except errors.ExistingContent:

That try block seems overly broad; surely only the final line can raise that
exception?

> +        elif kind == 'symlink':
> +            current_link_target = content_summary[3]
> +            if not store:
> +                # symmlink target is not generic metadata, check if it has

"symmlink" -> "symlink"

-Andrew.




More information about the bazaar mailing list