Rev 40: Change the code a little bit. in http://bazaar.launchpad.net/%7Ejameinel/bzr-groupcompress/experimental

John Arbash Meinel john at arbash-meinel.com
Thu Feb 19 20:50:35 GMT 2009


At http://bazaar.launchpad.net/%7Ejameinel/bzr-groupcompress/experimental

------------------------------------------------------------
revno: 40
revision-id: john at arbash-meinel.com-20090219204834-27ltrakcvdmlpqa8
parent: john at arbash-meinel.com-20090219204500-1wb0k8f962lansy6
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: experimental
timestamp: Thu 2009-02-19 14:48:34 -0600
message:
  Change the code a little bit.
  
  If a given text has not been seen before, insert all lines for that text.
  At present, we are doing *worse* than knit compression, because we have
  so many matching groups from various locations. Which causes us to
  just have huge swaths of copies.
  
  By inserting the full lines, we get more regions that we are able to
  generate a larger match against.
  
  This slows down the processing (10m => 24m), but improves compression
  (16MB => 12MB).
-------------- next part --------------
=== modified file 'groupcompress.py'
--- a/groupcompress.py	2009-02-19 20:45:00 +0000
+++ b/groupcompress.py	2009-02-19 20:48:34 +0000
@@ -156,6 +156,7 @@
         self.line_locations = self._equivalence_table_class([])
         self.lines = self.line_locations.lines
         self.labels_deltas = {}
+        self._present_prefixes = set()
 
     def get_matching_blocks(self, lines):
         """Return an the ranges in lines which match self.lines.
@@ -216,7 +217,15 @@
         range_start = 0
         flush_range = self.flush_range
         copy_ends = None
-        blocks = self.get_matching_blocks(lines)
+        blocks = None
+        if len(key) > 1:
+            prefix = key[0]
+            if prefix not in self._present_prefixes:
+                self._present_prefixes.add(prefix)
+                # Mark this as not matching anything
+                blocks = [(0, len(lines), 0)]
+        if blocks is None:
+            blocks = self.get_matching_blocks(lines)
         current_pos = 0
         # We either copy a range (while there are reusable lines) or we 
         # insert new lines. To find reusable lines we traverse 



More information about the bazaar-commits mailing list