[brisbane-core MERGE] CHKInventory.iter_non_root_entries()
John Arbash Meinel
john at arbash-meinel.com
Fri Mar 6 21:23:04 GMT 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
> Ian Clatworthy wrote:
>> So fast-import now kind of works for CHK repo formats, give or
>> take 'deleteall' directives (as used by e.g. darcs-fast-export).
>> Unfortunately though, it's rather slow at the moment - about
>> 5-7 times slower (for gc-chk255) than importing into a pack repo.
>> And pack importing is *not* fast - it's about 10 times slower
>> than git-fast-import or hg-fast-import from what I'm told. :-(
>> Time for some profiling ...
>
>> It seems that operations that walk directories in CHKInventories
>> are slow, e.g. directories() and iter_changes(). As it turns out
>> though, I don't *need* the path returned by iter_changes() in
>> fast-import - just the inventories entries while loading texts.
>
>> The attached patch adds a new method to CHKInventory called
>> iter_non_root_entries(). Using it instead of iter_entries() cuts
>> the fast-import time for gc-chk255 by half. Hooray.
>
>> Ian C.
>
>
> Oh, and I should also mention, I think I found a bug in how
> "_entry_cache" is used. Specifically it seems to be populated with:
>
> self._entry_cache[result.file_id] = result
>
>
> Which means it uses a real "file_id". However in the def children()
> property of CHKInventoryDirectory it does:
>
> child_ids = set()
> for (parent_id, name_utf8), file_id in parent_id_index.iteritems(
> key_filter=[(self.file_id,)]):
> child_ids.add((file_id,))
>
> note that it is adding a file_*key* here (a tuple of (file_id,)) and
> then it goes on later to do:
>
>
> for file_id, bytes in id_to_entry.iteritems(child_ids):
> entry = self._chk_inventory._bytes_to_entry(bytes)
> result[entry.name] = entry
> self._chk_inventory._entry_cache[file_id] = entry
> ^- so here when it is inserting the newly created entries, it is
> inserting them *by key*, because the child_ids set is actually
> child_keys, etc.
>
> I don't know the specific overhead involved, but I think there is a bit
> we can clean up before we have to introduce a new api.
>
> John
> =:->
Here is a patch that should clean up the file_id/file_key confusion in
the '.children()' function.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkmxlDgACgkQJdeBCYSNAAN9CQCfXrljM36bTJZB8y4doHNIRfT5
KH0An3Lc5Ktpx+vyeNyyuzAPyGMgGzmd
=Gdg/
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: chkdirectory_caching.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20090306/f71af0cd/attachment.diff
More information about the bazaar
mailing list