[RFC] Extract unannotated texts from Knits

John Arbash Meinel john at arbash-meinel.com
Wed Jul 25 22:32:21 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:

...

> 
>        1299ms get_lines
> 	     94ms working out what chunks to read
> 	    118ms keeping track of delta chains, etc
> 	    258ms reading the raw data chunks from disk
> 	    279ms parsing 650 fulltexts
> 	    260ms parsing 3.3k deltas
> 	     82ms applying 3.3k deltas to fulltexts
> 	    143ms verifying the sha1 sum of 650 fulltexts
> 
> We have about 95+118+258+143ms of general overhead that wouldn't really go
> away, so about 685ms to extract all the fulltexts and deltas from gzip hunks
> into lists, and apply them to get the final output texts.
> We have 650 files and 3.3k deltas for an average of 5 deltas per file, and
> 1.05ms to parse them and apply those 5 deltas.


One thing that caught my eye was just how much time we spent in 'bookkeeping'
to track what deltas we want to apply, etc.

So I rewrote get_lines() to not use get_line_list() [which is designed around
extracting multiple texts], so that it just uses a couple simple functions to
read the blocks off of disk, and turn them into a single text in memory.

According to lsprof, this drops the 'get_lines()' time from 1.299ms to 1.104ms
(almost 200ms faster). And real-world timing shows about 100ms (out of 3.5s) saved.

This could probably be adapted to be applied without my other changes. Though
it doesn't have nearly the total effect.

John
=:->


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGp8FkJdeBCYSNAAMRAuoNAKC45pHO7PQjTEIZEMX2xemByFpD2gCeJhR9
Bfwp8wf6QKMLx2z32+7/oB0=
=ZyOT
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pyrex_knit_extract_single_text.diff
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20070725/7ec63154/attachment.diff 


More information about the bazaar mailing list