RFC: dirstate-key-cache?
John Arbash Meinel
john at arbash-meinel.com
Tue Sep 25 22:31:21 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> On Tue, 2007-09-25 at 16:17 -0500, John Arbash Meinel wrote:
>
>
>>>> I'm not sure what you need to use them for, either. Since we have the
>>>> optimized cmp_by_dirs and cmp_path_by_dirblock functions. Which can do
>>>> the comparison without having to .split() at all. (Which is one of the
>>>> reasons why the C bisect is 17x faster.)
>>> So, I lost 4 seconds of performance adding the stat cache back in, and
>>> about 4 of those seconds are in _get_entry. Which is why I was wondering
>>> about a cache, and the C extensions didn't help at all with the
>>> performance there.
>>>
>>> -Rob
>> Out of curiosity, which _get_entry call? And the performance of what?
>> (commit? status?)
>
> WorkingTree._path_content_summary -> WorkingTree4._sha_from_stat ->
> self.current_state()._get_entry(0, path)
Which is during commit, right?
And the big killer here is that your iterator isn't returning this
information for you, so you have to go back and pull it out again. At a
minimum you have to do a split() to get back the containing directory, etc.
IMO, what you would really want is to change your commit iterator so
that you can get all of your information about each node without having
to do another lookup.
...
>
> I'm thinking now about doing a last-looked-up-dirblock cache.
>
> I'd expect that to work well, as we iterate in dirblock order during
> commit now.
>
> -Rob
But barring that, I think this would work well. I actually already use
something similar in the _iter_changes code. Specifically there are
"last_source_parent, last_target_parent" variables which cache the
containing directory entry. Otherwise we had 2 _get_entry lookups for
*every* file in the tree (because we need to check if they have moved to
a new directory, etc.)
Having a single "last" variable was sufficient to remove all of the
extra lookups on a pristine tree. (Obviously you need to do
Num_directory lookups, but you are trying to avoid Num_files lookups.)
So I'm guessing that rather than a cache, just a 'last' pointer may get
you want you want, and it a much simpler way.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFG+X4pJdeBCYSNAAMRAo9pAJ4s1bAgVNFBRgA9C6iVLtVuFdV/iQCeL9w7
eGg7qaEnK6yXlM5GKLeVhJM=
=eE8t
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list