[MERGE 0.92] move responsibility for storing entries to CommitBuilder

Wed Sep 12 03:20:00 BST 2007

Rob,

Some more comments below to consider.

Robert Collins wrote:

> Caveats:
>  - I'm not sure if path_content_summary should exist, or if its entirely
> correct for windows at the moment, and it doesn't currently offer the
> hash-cache optimisation ever, so this is not suitable for inclusion in a
> release.

> +        head_set = self.repository.get_graph().heads(parent_candiate_entries.keys())
> +        heads = []
> +        for inv in parent_invs:
> +            if ie.file_id in inv:
> +                old_rev = inv[ie.file_id].revision
> +                if old_rev in head_set:
> +                    heads.append(inv[ie.file_id].revision)
> +                    head_set.remove(inv[ie.file_id].revision)

The last 2 lines can reuse the old_rev variable rather than look up the
value each time.

> +    def path_content_summary(self, path):
> +        """See Tree.path_content_summary."""
> +        id = self.inventory.path2id(path)
> +        if id is None:
> +            return ('missing', None, None, None)
> +        entry = self._inventory[id]
> +        kind = entry.kind
> +        if kind == 'file':
> +            return (kind, entry.text_size, entry.executable, entry.text_sha1)
> +        elif kind == 'symlink':
> +            return (kind, None, None, entry.symlink_target)
> +        else:
> +            return (kind, None, None, None)
> +

As discussed on IRC, it does feel weird to me to be converting from path
to id when the id is already know in the calling code. It isn't uncommon
for us to pass in path as an optional parameter for id-centric APIs. We
could do the same sort of thing here - pass in an optional id for a
path-centric API.

IIUIC, passing in id doesn't make semantic sense and we don't need it
currently for the performance sensitive tree implementations. If that's
true, then YAGNI is completely agreed. I'm raising it though because
I've seen quite a few cases in recent months where some trees like ids
and other like paths so having both in the API has been useful
performance wise.

> === modified file 'bzrlib/workingtree.py'
> +        if kind == 'file':
> +            size = stat_result.st_size
> +            # try for a stat cache lookup
> +            if not supports_executable():
> +                executable = None # caller can decide policy.
> +            else:
> +                mode = stat_result.st_mode
> +                executable = bool(stat.S_ISREG(mode) and stat.S_IEXEC & mode)
> +            sha1 = None # 'stat-hit-check' here
> +            return (kind, size, executable, sha1)

Always setting the sha to None doesn't do much for normal (non-initial)
commit performance. :-) As mentioned above, this needs to be fixed.

I had a quick look at how that might be done. Simply calling
get_file_sha(id, file) would be a quick hack except that id isn't passed
in and it very much defeats the purpose of 'only return sha if known'.

For WorkingTree3, I gather _hashcache could gain a method called
something like get_sha_if_known that reuses much of the low level code
there already but doesn't go on to calculate it if unknown.

For WorkingTree4, there's no _hashcache attribute so either an abstract
method (tree._get_sha_if_known say) is needed instead or a custom
path_content_summary method is needed for that tree.

Thinking out loud, does the caller of path_content_summary always know
when to skip the sha lookup (because the file is being added and
therefore has no parents)? If so, is it worth passing an extra parameter
in saying whether to skip the cache lookup altogether? Sorry if that's a
dumb question but not doing stuff is often a win over doing it quickly. :-)

Time to get my head around more of the WorkingTree4 code ...

Ian C.