[MERGE 0.92] move responsibility for storing entries to CommitBuilder

Robert Collins robertc at robertcollins.net
Wed Sep 12 05:26:50 BST 2007


On Wed, 2007-09-12 at 12:20 +1000, Ian Clatworthy wrote:
> Rob,
> 
> Some more comments below to consider.
> 
> Robert Collins wrote:


> > +    def path_content_summary(self, path):
> > +        """See Tree.path_content_summary."""
> > +        id = self.inventory.path2id(path)
> > +        if id is None:
> > +            return ('missing', None, None, None)
> > +        entry = self._inventory[id]
> > +        kind = entry.kind
> > +        if kind == 'file':
> > +            return (kind, entry.text_size, entry.executable, entry.text_sha1)
> > +        elif kind == 'symlink':
> > +            return (kind, None, None, entry.symlink_target)
> > +        else:
> > +            return (kind, None, None, None)
> > +
> 
> As discussed on IRC, it does feel weird to me to be converting from path
> to id when the id is already know in the calling code. It isn't uncommon
> for us to pass in path as an optional parameter for id-centric APIs. We
> could do the same sort of thing here - pass in an optional id for a
> path-centric API.

Its true that this converts to id to do its lookup, but this instance of
the API is not used by any command line operation ever. It's only for
completeness and meeting the 'tree' interface that there are
implementations of this on RevisionTree and DirstateRevisionTree. If we
had mutabletree implementation tests it would be better to put it there.

> IIUIC, passing in id doesn't make semantic sense and we don't need it
> currently for the performance sensitive tree implementations. If that's
> true, then YAGNI is completely agreed. I'm raising it though because
> I've seen quite a few cases in recent months where some trees like ids
> and other like paths so having both in the API has been useful
> performance wise.

Right, see above.

> > === modified file 'bzrlib/workingtree.py'
> > +        if kind == 'file':
> > +            size = stat_result.st_size
> > +            # try for a stat cache lookup
> > +            if not supports_executable():
> > +                executable = None # caller can decide policy.
> > +            else:
> > +                mode = stat_result.st_mode
> > +                executable = bool(stat.S_ISREG(mode) and stat.S_IEXEC & mode)
> > +            sha1 = None # 'stat-hit-check' here
> > +            return (kind, size, executable, sha1)
> 
> Always setting the sha to None doesn't do much for normal (non-initial)
> commit performance. :-) As mentioned above, this needs to be fixed.

Yes. One way to fix this is not to have the path_content_summary API -
to get the data from an iterator or some other api. I'm working on
pushing the use of path_content_summary outside the CommitBuilder
completely, so that this is viable.

> I had a quick look at how that might be done. Simply calling
> get_file_sha(id, file) would be a quick hack except that id isn't passed
> in and it very much defeats the purpose of 'only return sha if known'.

A new variant of that api might be a useful thing to add to tje
workingtree_implementations tests - one that is None-or-digest in
result.

> For WorkingTree4, there's no _hashcache attribute so either an abstract
> method (tree._get_sha_if_known say) is needed instead or a custom
> path_content_summary method is needed for that tree.

I think its reasonable to add a method; it would want to take the stat
result and path, so that it does not end up statting itself.

> Thinking out loud, does the caller of path_content_summary always know
> when to skip the sha lookup (because the file is being added and
> therefore has no parents)? If so, is it worth passing an extra parameter
> in saying whether to skip the cache lookup altogether? Sorry if that's a
> dumb question but not doing stuff is often a win over doing it quickly. :-)

I think this is over-optimisation, doing a lookup is the common case,
and the uncommon case will not pay much for a single lookup - though
like I mention above, pushing this outside CommitBuilder makes
path_content_summary possibly unneeded altogether.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070912/e2c6894f/attachment.pgp 


More information about the bazaar mailing list