[RFC] Caching in chk_map.py - advice needed

Ian Clatworthy ian.clatworthy at internode.on.net
Sat Mar 7 15:14:06 GMT 2009

So deserialisation of nodes is taking 5-10% of the
time in fast-import and I'm thinking "it shouldn't
need to be doing that because I just serialised
those nodes". We *are* caching things in chk_map.py
but it's the mapping from

  key -> raw bytes

not the mapping from ??? -> nodes.

I've trying adding the latter using numerous attempts
at a lookup key but they have all broken the test suite.
Can anyone tell me whether that's because what I'm trying
is do is conceptually wrong or not? Or are the tests
breaking because the sample data in there is ultra
simplistic and therefore causing unexpected clashes?

In the last 24 hours, I've managed to get fast-import
down from 49m to 3m on my sample data set. But I'm sure
there's still plenty of scope for tuning the chk-map
layer. Altogether 32% of the time is taken in there
while the groupcompress layer takes 20% and the parsing
layer takes 3.5%.

On the bright side, importing into a gc-chk255 branch
is now twice as fast as importing into a 1.9 branch.
It's still a bit slower though that how fast fast-import
was back in the old (pre VersionedFiles) days: it use
to take a mere 1m50s IIRC (and the parsing layer is 30s
or more quicker now).

Ian C.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: chk-node-cache.diff
Type: text/x-diff
Size: 2594 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090308/f760ac01/attachment.bin 

More information about the bazaar mailing list