[MERGE] 10-15% faster inventory serialisation but changes canonical form
Robert Collins
robertc at robertcollins.net
Fri Sep 21 20:10:52 BST 2007
On Fri, 2007-09-21 at 13:55 -0500, John Arbash Meinel wrote:
> Well, you might consider rewriting it in a Pyrex extension.
>
> Our current "append = out.append" hack is because list appending is the
> fastest thing we could find in Python. if checks get a lot cheaper in Pyrex.
See my later updated version, its faster still.
> But what is the total time for inventory serialization? I have the
> feeling it is maybe 1s for a Moz tree. Which means at best you can save
> only 1 second of commit time. Which starts to matter a bit for
> incremental commits (because we are creating a full inventory and then
> diffing, etc).
I'm aiming for 5 seconds for incremental commits for Moz trees;
currently serialisation is going to be about 900ms, + diff + gzip.
> Oh, and you asked off list about sha1 sums, etc. And I'm pretty sure
> that our "osutils.sha_strings()" function is quite fast. (Calculating
> the sha1 of 485 files takes 190ms on my machine.)
FWIW I saved 10 seconds of incremental commit by moving to sha_string
even with an additional string join.
timeit -s 'data = ...' -s 'from bzrlib.osutils import sha_strings' 'for
x in xrange(50000): sha_strings(data)'
(and likewise for sha_string), testing with 800 80-character lines, was
30 seconds for sha_strings, 20 for sha_string - and real world use
confirms this.
> So I would guess that building up the Inventory into a list of strings
> and passing that directly to the patience code would be better than
> serializing all the way down to a pure string, and then building it back up.
Yup, thats already done and in bzr.dev.
> The flip side is that you might end up creating 1 list per line (which
> you then ''.join()) so it might cost a bit of malloc time, etc.
The code was small enough I just flatted each type to a single string
creation call.
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070922/12830234/attachment.pgp
More information about the bazaar
mailing list