RFC: The "xml8" patch

John Arbash Meinel john at arbash-meinel.com
Mon Nov 30 03:59:41 GMT 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Some conversion speed issues cropped up again recently, which reminded
me to test things. Which did lead to this patch:
https://code.edge.launchpad.net/~jameinel/bzr/2.1.0b4-convert-kg-heads/+merge/15394

It also reminded me of the 'xml8' patch that I've had lying around
forever. Namely:

=== modified file 'bzrlib/xml8.py'
- --- bzrlib/xml8.py      2009-07-07 04:32:13 +0000
+++ bzrlib/xml8.py      2009-11-30 03:40:50 +0000
@@ -433,10 +433,10 @@
                 pass
             else:
                 # Only copying directory entries drops us 2.85s => 2.35s
- -                # if cached_ie.kind == 'directory':
- -                #     return cached_ie.copy()
- -                # return cached_ie
- -                return cached_ie.copy()
+                if cached_ie.kind == 'directory':
+                    return cached_ie.copy()
+                return cached_ie
+                # return cached_ie.copy()

         kind = elt.tag
         if not InventoryEntry.versionable_kind(kind):


If you look at the attached graphs, you can see that it saves about
1/3rd of the total conversion time by just uncommenting those lines. My
understanding is that it:

1) Saves extraction time by not copying all entries
2) Saves '_make_delta()' times because we don't have to check all
attributes of objects which are otherwise identical.
3) Probably also saves on some small aspects of memory consumption, etc.

The main reason it hasn't been included is because it isn't strictly
safe as is. Namely at least the test suite likes to take an Inventory
and then mutate attributes of its entries directly. (And mutating an
item that is in a cache is a bad thing.)

It would be good to figure out a way to do it safely, because it
obviously can make a big difference in overall conversion times.

1) Pass a parameter down all the way from Repo.revision_trees()
indicating that it is safe to use cached items. The main ugliness of
this is that there is a fairly high number of layers the flag needs to
go through.

 repo.revision_trees()
 repo.iter_inventories()
 repo._iter_inventories()
 repo.deserialise_inventory()
 repo._serializer.read_inventory_from_string()
 repo._serializer._unpack_inventory()
 repo._serializer._unpack_entry()

2) Set a flag on repo._serializer._use_from_cache

  This isn't strictly safe, in that it both isn't thread safe, (this
  thread is doing safe conversions, but that thread is grabing
  repo.revision_tree().inventory and mutating it.)

  However (1) is just really ugly to me. Do other people feel
  differently?

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksTQy0ACgkQJdeBCYSNAANuNQCfQowchRUf20QkzMMvM5F/Ha5O
WIcAoI8C+AC0CTs6Z2Byr+ryZJk506FU
=gctf
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: conversion_times.png
Type: image/png
Size: 60286 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20091129/f98a1def/attachment-0001.png 


More information about the bazaar mailing list