RFC: dirstate-key-cache?

Robert Collins robertc at robertcollins.net
Tue Sep 25 22:32:01 BST 2007


On Tue, 2007-09-25 at 16:17 -0500, John Arbash Meinel wrote:


> >> I'm not sure what you need to use them for, either. Since we have the
> >> optimized cmp_by_dirs and cmp_path_by_dirblock functions. Which can do
> >> the comparison without having to .split() at all. (Which is one of the
> >> reasons why the C bisect is 17x faster.)
> > 
> > So, I lost 4 seconds of performance adding the stat cache back in, and
> > about 4 of those seconds are in _get_entry. Which is why I was wondering
> > about a cache, and the C extensions didn't help at all with the
> > performance there.
> > 
> > -Rob
> 
> Out of curiosity, which _get_entry call? And the performance of what?
> (commit? status?)

WorkingTree._path_content_summary -> WorkingTree4._sha_from_stat ->
self.current_state()._get_entry(0, path)

> It would be more interesting to see a bit of your lsprof results. There
> is quite a few layers for DirState._get_entry() in that it splits the
> path, and then goes to _get_block_entry_index, which calls
> _find_block_index_from_key, which at least uses the optimized bisect
> function.

Right.

> I did that optimization because I was doing DirState.add() 50k times as
> part of the benchmarks, and I saw that it was spending an awful lot
> amount of time bisecting. (IIRC it was doing it 2 times, 1 to get the
> containing directories entry, and a second to get the location where it
> should put the current record.)
> 
> Now, if you were going by file_id through _get_entry (rather than by
> path), I would expect at least some overhead at building up the file_id
> index. (4s for a huge tree might be possible. It seems a bit long, but
> not impossibly so.)
> 
> So it might be _get_id_index() which is actually hurting you.

Nope.

> I'm just curious why having the stat cache is causing you to call
> _get_entry enough times to cause problems. Overall, I feel very unsure
> about what you are doing to give any more info here.

I'm thinking now about doing a last-looked-up-dirblock cache.

I'd expect that to work well, as we iterate in dirblock order during
commit now.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070926/7ec0a916/attachment.pgp 


More information about the bazaar mailing list