[MERGE] Make _nodes_by_key lazy

John Arbash Meinel john at arbash-meinel.com
Mon Aug 25 20:15:07 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The attached patch changes "*Index.add_node" so that it doesn't have to update
the _nodes_by_key dictionary. In my earlier profiling, I found a significant
amount of time spent there. (Out of ~7s, we were spending 1.6s updating
_nodes_by_key, which was then never used).

It also changes the BTreeBuilder._nodes dictionary to not worry about "absent"
keys. It shaves about 200ms off building bzr.dev indices. I also moved the
"sorted()" into _iter_mem_nodes. This turned out to be a massive savings. We
know that the only part of the tuple that matters is the "key" and changing
sorted(self._nodes) instead of sorting [(self, key, value, references)] shaves
off something like 600ms when building my bzr repository indices.

Overall, this patch changes it from 9.0s => 7.2s for building my bzr.dev
repository.

There is also a small bugfix that shouldn't really matter. Specifically in
ChunkWriter.finish():

- -        nulls_needed = self.chunk_size - self.bytes_out_len % self.chunk_size
+        nulls_needed = self.chunk_size - self.bytes_out_len

The bug is if we have exactly "bytes_out_len". The old version would give 0,
which causes us to generate an extra chunk of all null.

It turns out that we don't hit that with the current code, because we always
give a 10-byte buffer for Z_SYNC_FINISH. But it only takes 6 bytes. So we
always are 4-bytes shy of full. (I discovered this when I changed the code to
not use compression, and then I was hitting exactly full pages.)

My next patch will revise my updates to the SYNC logic, to allow single-pass
compression most of the time. It should bring the bzr.dev time down to under 5s.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIswS7JdeBCYSNAAMRAkMGAJ907kPrw1/b7PxCGcENj8gG5q9C+QCgplos
PHFLAWxYFQDXmX6wpcUVWhg=
=Ze8N
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: index_lazy_nodes_by_key.patch
Type: text/x-diff
Size: 27364 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080825/d95dca6e/attachment-0001.bin 


More information about the bazaar mailing list