Rev 5543: A TODO entry. in http://bazaar.launchpad.net/~jameinel/bzr/2.3-gcb-peak-mem

Thu Dec 2 21:42:54 GMT 2010

At http://bazaar.launchpad.net/~jameinel/bzr/2.3-gcb-peak-mem

------------------------------------------------------------
revno: 5543
revision-id: john at arbash-meinel.com-20101202214246-cpvdr4s5bdt52r59
parent: john at arbash-meinel.com-20101202212630-vycb3zf5uy5iz2tc
fixes bug(s): https://launchpad.net/bugs/602614
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 2.3-gcb-peak-mem
timestamp: Thu 2010-12-02 15:42:46 -0600
message:
  A TODO entry.
  
  Overall this experiment hasn't been particularly beneficial. The final speed
  is still slower than the existing code, and the primary knob that reduced
  peak memory size is to change the stride for large content. Which can be
  trivially added to the existing match code.
  
  I like the code quite a bit, but I wish I had more to show for the amount
  of effort put into it.
-------------- next part --------------
=== modified file 'bzrlib/_delta_index_pyx.pyx'

--- a/bzrlib/_delta_index_pyx.pyx	2010-12-02 21:26:30 +0000
+++ b/bzrlib/_delta_index_pyx.pyx	2010-12-02 21:42:46 +0000
@@ -108,6 +108,14 @@
 atexit.register(report_total_time)
 
 
+# TODO: This is the primary table entry in the hash map. As such, this is the
+#       part that scales O(N) and thus can provide the largest memory savings.
+#       We should determine what attributes are really necessary.
+#       Also, we currently store the hash entries inline, which means each
+#       'empty' entry is a full rabin_entry in size. We could, instead,
+#       allocate a table of rabin_entry, and then pointers into that table for
+#       the hash table. This would reduce the cost of an empty hash slot, at
+#       the cost of adding O(N) pointers.
 cdef struct rabin_entry:
     # A pointer to the actual matching bytes
     const_data ptr