[MERGE] Faster 'build_tree'

John Arbash Meinel john at arbash-meinel.com
Tue Jul 24 14:36:25 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I was poking around our knit extraction code, because it is one of our current
bottlenecks. And I found a few areas that could use some cleanup.

Attached is a patch which includes a few improvements, and makes 'bzr checkout'
about 10% faster. (it usually takes 5s for bzr.dev, and this drops it to around
4.5s).

It also focuses on making sure we are using more purpose-built apis. So instead
of doing "tree.get_file(file_id).readlines()" we use
"tree.get_file_lines(file_id)".
On of the big reasons for that is different trees have different implementation
needs. In the case of Knits, we read them as a line-based format,
get_file(file_id).readlines(), has to read the lines into a list, and then
combine them into a single string, to put them in a StringIO(), which then
splits them back into lines.

Surprisingly enough, that was actually pretty quick.

The biggest fix here was to switch from "_get_content_maps()" to
"_get_text_map()". When we were extracting texts it was causing us to read and
build up all the annotations, which we then just threw away.

I think this would be a good place to insert some Pyrex if someone is
interested. (If nobody replies, I'll probably give it a lookover in a day or
so). I *think* that python's gzip code is a bit inefficient, but I'm not
positive on that. But I know that the profiler shows quite a bit of time spent
in gzip.readlines(). Which doesn't quite fit how much it should actually take
to extract that text.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGpgBYJdeBCYSNAAMRAg3sAJ9WIFzHJjYimgOOmOx1Ir6j0tgdngCfTC6u
h5d9XuC6aggA7eI5CNSQMPI=
=WZM9
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: faster_knit_extract.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20070724/bf58bc87/attachment-0001.diff 


More information about the bazaar mailing list