[brisbane-core MERGE] CHKInventory.iter_non_root_entries()
John Arbash Meinel
john at arbash-meinel.com
Fri Mar 6 21:15:39 GMT 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ian Clatworthy wrote:
> So fast-import now kind of works for CHK repo formats, give or
> take 'deleteall' directives (as used by e.g. darcs-fast-export).
> Unfortunately though, it's rather slow at the moment - about
> 5-7 times slower (for gc-chk255) than importing into a pack repo.
> And pack importing is *not* fast - it's about 10 times slower
> than git-fast-import or hg-fast-import from what I'm told. :-(
> Time for some profiling ...
>
> It seems that operations that walk directories in CHKInventories
> are slow, e.g. directories() and iter_changes(). As it turns out
> though, I don't *need* the path returned by iter_changes() in
> fast-import - just the inventories entries while loading texts.
>
> The attached patch adds a new method to CHKInventory called
> iter_non_root_entries(). Using it instead of iter_entries() cuts
> the fast-import time for gc-chk255 by half. Hooray.
>
> Ian C.
>
Oh, and I should also mention, I think I found a bug in how
"_entry_cache" is used. Specifically it seems to be populated with:
self._entry_cache[result.file_id] = result
Which means it uses a real "file_id". However in the def children()
property of CHKInventoryDirectory it does:
child_ids = set()
for (parent_id, name_utf8), file_id in parent_id_index.iteritems(
key_filter=[(self.file_id,)]):
child_ids.add((file_id,))
note that it is adding a file_*key* here (a tuple of (file_id,)) and
then it goes on later to do:
for file_id, bytes in id_to_entry.iteritems(child_ids):
entry = self._chk_inventory._bytes_to_entry(bytes)
result[entry.name] = entry
self._chk_inventory._entry_cache[file_id] = entry
^- so here when it is inserting the newly created entries, it is
inserting them *by key*, because the child_ids set is actually
child_keys, etc.
I don't know the specific overhead involved, but I think there is a bit
we can clean up before we have to introduce a new api.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkmxknsACgkQJdeBCYSNAAOUbgCeLLZHHyIYGSVAFNb+CINu1F38
k+oAmgMJGWOhwil/c92Ckf+UfNWEKsnp
=XBkM
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list