[brisbane-core MERGE] CHKInventory.iter_non_root_entries()

John Arbash Meinel john at arbash-meinel.com
Fri Mar 6 21:15:39 GMT 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> So fast-import now kind of works for CHK repo formats, give or
> take 'deleteall' directives (as used by e.g. darcs-fast-export).
> Unfortunately though, it's rather slow at the moment - about
> 5-7 times slower (for gc-chk255) than importing into a pack repo.
> And pack importing is *not* fast - it's about 10 times slower
> than git-fast-import or hg-fast-import from what I'm told. :-(
> Time for some profiling ...
> 
> It seems that operations that walk directories in CHKInventories
> are slow, e.g. directories() and iter_changes(). As it turns out
> though, I don't *need* the path returned by iter_changes() in
> fast-import - just the inventories entries while loading texts.
> 
> The attached patch adds a new method to CHKInventory called
> iter_non_root_entries(). Using it instead of iter_entries() cuts
> the fast-import time for gc-chk255 by half. Hooray.
> 
> Ian C.
> 

Oh, and I should also mention, I think I found a bug in how
"_entry_cache" is used. Specifically it seems to be populated with:

        self._entry_cache[result.file_id] = result


Which means it uses a real "file_id". However in the def children()
property of CHKInventoryDirectory it does:

        child_ids = set()
        for (parent_id, name_utf8), file_id in parent_id_index.iteritems(
            key_filter=[(self.file_id,)]):
            child_ids.add((file_id,))

note that it is adding a file_*key* here (a tuple of (file_id,)) and
then it goes on later to do:


        for file_id, bytes in id_to_entry.iteritems(child_ids):
            entry = self._chk_inventory._bytes_to_entry(bytes)
            result[entry.name] = entry
            self._chk_inventory._entry_cache[file_id] = entry
^- so here when it is inserting the newly created entries, it is
inserting them *by key*, because the child_ids set is actually
child_keys, etc.

I don't know the specific overhead involved, but I think there is a bit
we can clean up before we have to introduce a new api.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkmxknsACgkQJdeBCYSNAAOUbgCeLLZHHyIYGSVAFNb+CINu1F38
k+oAmgMJGWOhwil/c92Ckf+UfNWEKsnp
=XBkM
-----END PGP SIGNATURE-----



More information about the bazaar mailing list