[MERGE][RFC] further add performance improvements
Robert Collins
robertc at robertcollins.net
Mon May 22 00:14:50 BST 2006
On Sat, 2006-05-20 at 08:33 -0500, John A Meinel wrote:
> Robert Collins wrote:
> > On Fri, 2006-05-19 at 09:22 -0500, John Arbash Meinel wrote:
>
> ...
>
> >>>
> >> I think we could have 'add' always add the files to the inventory. We
> >> just don't have to write the inventory to the disk when we are done.
> >
> > That was my example above - not calling write_inventory. Sounds like you
> > agreed.
> >
> > Do I remember you doing some profiling on inventory writing performance
> > at some point ?
> >
> > Rob
> >
>
> I did profiling on using cElementTree versus a manual to_xml converter.
> And I found that we didn't do a whole lot better/worse (I believe
> cElementTree uses ElementTree's python implementation for serializing to
> a string).
> The biggest thing that it let us do, was customize how we wrote the XML,
> so that it would play much nicer with Weaves. I could cut down the
> inventory.weave size drastically, like at least to 1/2.
Ah.
> We probably could get a similar improvement in knit sizes, though I
> would rather see us break up the inventory into per-directory stuff first.
Me too :)
> I would be curious to see if switching to RIO would gain us more of a
> speed improvement than using XML. Though I'm not confident, since we are
> using a compiled XML parser, versus a potential python parser (for RIO).
Martin indicated to me that RIO was up to twice as fast as celementree.
> We may want a way to write a delta-inventory. Since with a large tree
> you have to read the whole inventory, add a single line, and write out
> the whole thing again. But if we broke it into per-directory, that could
> probably be improved a lot.
Well, the specific case I'm looking at is add.
Of an add of 10824 files and directories, which takes me 4,8 seconds by
the wall clock, just over half wass spent writing the inventory in the
working tree. I've optimised the cdata and attribute escaping routines
(50% faster) and its now just under half the time:
1 0 22042.0580 952.9310 bzrlib.add:94(smart_add_tree)
+10825 0 415.8310 279.5650 +posixpath:56(join)
+585 0 418.7580 226.8350 +<posix.listdir>
+10825 0 884.0660 218.8230 +bzrlib.osutils:77(file_kind)
+10824 0 3113.0610 210.7580 +bzrlib.workingtree:1072(is_ignored)
+10824 0 3735.9380 139.4010 +bzrlib.add:231(__add_one)
+10825 0 492.6850 134.2070 +bzrlib.workingtree:381(abspath)
+11408 0 633.9420 132.0450 +posixpath:110(basename)
+10826 0 327.5540 129.5220 +bzrlib.workingtree:320(is_control_filename)
+10825 0 63.5100 63.5100 +<method 'append' of 'list' objects>
+10825 0 60.6600 60.6600 +bzrlib.inventory:331(versionable_kind)
+10824 0 59.6710 59.6710 +<method 'extend' of 'list' objects>
+585 0 1941.7680 31.8630 +bzrlib.bzrdir:450(open)
+1 0 8937.8270 0.2990 +bzrlib.decorators:48(write_locked)
(the last line is the call to wt._write_inventory).
Of that 8900:
7 0 8862.1330 0.1280 bzrlib.xml_serializer:42(write_inventory)
+7 0 3115.3350 190.7810 +bzrlib.xml5:31(_pack_inventory)
+7 0 5746.6700 0.2030 +bzrlib.xml_serializer:68(_write_element)
5746 is spent writing the xml tree to disk, and 3115 converting the
inventory to the flattened tree.
My intuition is that a single pass over the inventory with a custom
writer will be much faster. I guess I'll do one without worrying about
correctness, and if it is much faster, we can consider doing that for
real.
Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060522/13c73899/attachment.pgp
More information about the bazaar
mailing list