[MERGE] Move serialisation of inventories into CommitBuilder

Wed Oct 3 00:59:35 BST 2007

On Tue, 2007-10-02 at 19:20 +1000, Andrew Bennetts wrote:
> bb:tweak
> 
> This change seems reasonable to me.  I'm not deeply familiar with CommitBuilder
> though, so if someone else wants to review the basic idea of this change that
> would be great.
> 
> Further comments inline.
> 
> > === modified file 'bzrlib/commit.py'
> > --- bzrlib/commit.py	2007-09-21 04:22:53 +0000
> > +++ bzrlib/commit.py	2007-10-02 01:06:55 +0000
> [...]
> > +            content_summary = self.work_tree.path_content_summary(path)
> > +            if not specific_files or is_inside_any(specific_files, path):
> > +                if content_summary[0] == 'missing':
> > +                    deleted_paths.add(path)
> > +                    self.reporter.missing(path)
> > +                    deleted_ids.append(file_id)
> > +                    continue
> > +            # TODO: have the builder do the nested commit just-in-time IF and
> > +            # only if needed.
> > +            if content_summary[0] == 'tree-reference':
> > +                # enforce repository nested tree policy.
> > +                if (not self.work_tree.supports_tree_reference() or
> 
> Is it really possible that we could have a tree-reference here if work_tree
> doesn't support tree references?  That seems odd.  Can you explain to me how
> that might happen?   (I'm guessing this seems odd to me because of my ignorance,
> rather than because it's a problem.)

path_content_summary reports the status of whats on disk, not the
logical support for a feature. The check here is a policy check, to keep
that method from having to lie ('no its not a subtree, even if it is'),
which keeps it general enough to be reused e.g. by add.

> > === modified file 'bzrlib/memorytree.py'
> > --- bzrlib/memorytree.py	2007-08-23 00:12:35 +0000
> > +++ bzrlib/memorytree.py	2007-09-05 05:51:34 +0000
> > @@ -98,6 +98,26 @@
> >              return None, False, None
> >          return entry.kind, entry.executable, None
> >  
> > +    def path_content_summary(self, path):
> > +        """See Tree.path_content_summary."""
> > +        id = self.path2id(path)
> > +        if id is None:
> > +            return 'missing', None, None, None
> > +        kind = self.kind(id)
> > +        if kind == 'file':
> > +            bytes = self._file_transport.get_bytes(path)
> > +            size = len(bytes)
> > +            executable = self._inventory[id].executable
> 
> It doesn't make sense to me to get the executable bit from the inventory but the
> file size from transport.  Shouldn't you get both from the same source?

This is what we do for support on filesystems that are not natively
supporting the execute bit; as this is primarily test harness support it
seems sufficient unto the purpose to me.

> [...]
> > === modified file 'bzrlib/repository.py'
> > --- bzrlib/repository.py	2007-09-25 03:36:29 +0000
> > +++ bzrlib/repository.py	2007-09-27 21:11:38 +0000
> > @@ -200,14 +200,15 @@
> [...]
> > +        :param content_summary: Summary data from the tree about the paths
> > +            content - stat, length, exec, sha/link target. This is only
> > +            accessed when the entry has a revision of None - that is when it is
> > +            a candidate to commit.
> >          :return: True if a new version of the entry has been recorded.
> >              (Committing a merge where a file was only changed on the other side
> >              will not return True.)
> >          """
> >          if self.new_inventory.root is None:
> >              self._check_root(ie, parent_invs, tree)
> > +        if ie.revision is None:
> > +            kind = content_summary[0]
> 
> I do wonder if it would be a little clearer to unpack the content_summary tuple
> rather than indexing it all the time.  You've done a good job of commenting when
> necessary though so I'm not too worried about this.  It might be fractionally
> quicker though ;)

It will be worth profiling when the body of this method gets to be a
larger chunk of overall time.

> > +        else:
> > +            # ie is carried over from a prior commit
> > +            kind = ie.kind
> > +        # XXX: repository specific check for nested tree support goes here - if
> > +        # the repo doesn't want nested trees we skip it ?
> > +        if (kind == 'tree-reference' and
> > +            not self.repository._format.supports_tree_reference):
> > +            # mismatch between commit builder logic and repository:
> > +            # this needs the entry creation pushed down into the builder.
> > +            raise NotImplementedError
> 
> A brief message in this assert would be good, just to make sure we don't get
> mysterious bug reports from users if they do trigger this somehow.

"raise NotImplementedError('missing subtree support in the repository')"

> > +        # transitional assert only, will remove before release.
> > +        assert ie.kind == kind
> 
> Is there a bug or something tracking this, to make sure we remove it before the
> release?  Maybe stick a "or version != (0, 92)" in the condition ;)

Another couple of days and I hope to have a patch for it; its not
appropriate IMO to break the release via a condition like that, cute
though it may be.

> > +                else:
> > +                    # Either there is only a hash change(no hash cache entry,
> > +                    # or same size content change), or there is no change on
> > +                    # this file at all.
> > +                    # There is a race condition when inserting content into the
> > +                    # knit though that can result in different content being
> > +                    # inserted so even though we may have had a hash cache hit
> > +                    # here we still tell the store the hash we would *not*
> > +                    # store a new text on, which means that it can avoid for us
> > +                    # without a race condition and without double-shaing the
> > +                    # lines.
> > +                    nostore_sha = parent_entry.text_sha1
> 
> I don't quite follow that comment.  That sixty-four word sentence is just a bit
> too convoluted for me to be able to detangle as I read it.  Could you please try
> rephrasing it using shorter sentences?

Uhm, nothing comes to mind; but I've been up on the coal face on this
bit of code for some time. Care to suggest a rephrase?

> > +            if store:
> > +                nostore_sha = None
> > +            try:
> > +                ie.executable = content_summary[2]
> > +                lines = tree.get_file(ie.file_id, path).readlines()
> > +                ie.text_sha1, ie.text_size = self._add_text_to_weave(
> > +                    ie.file_id, lines, heads, nostore_sha)
> > +            except errors.ExistingContent:
> 
> That try block seems overly broad; surely only the final line can raise that
> exception?

Thanks, fixing.

> > +        elif kind == 'symlink':
> > +            current_link_target = content_summary[3]
> > +            if not store:
> > +                # symmlink target is not generic metadata, check if it has
> 
> "symmlink" -> "symlink"

Ditto.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20071003/8fea1259/attachment.pgp