Rev 40: Change the code a little bit. in http://bazaar.launchpad.net/%7Ejameinel/bzr-groupcompress/experimental
John Arbash Meinel
john at arbash-meinel.com
Thu Feb 19 20:50:35 GMT 2009
At http://bazaar.launchpad.net/%7Ejameinel/bzr-groupcompress/experimental
------------------------------------------------------------
revno: 40
revision-id: john at arbash-meinel.com-20090219204834-27ltrakcvdmlpqa8
parent: john at arbash-meinel.com-20090219204500-1wb0k8f962lansy6
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: experimental
timestamp: Thu 2009-02-19 14:48:34 -0600
message:
Change the code a little bit.
If a given text has not been seen before, insert all lines for that text.
At present, we are doing *worse* than knit compression, because we have
so many matching groups from various locations. Which causes us to
just have huge swaths of copies.
By inserting the full lines, we get more regions that we are able to
generate a larger match against.
This slows down the processing (10m => 24m), but improves compression
(16MB => 12MB).
-------------- next part --------------
=== modified file 'groupcompress.py'
--- a/groupcompress.py 2009-02-19 20:45:00 +0000
+++ b/groupcompress.py 2009-02-19 20:48:34 +0000
@@ -156,6 +156,7 @@
self.line_locations = self._equivalence_table_class([])
self.lines = self.line_locations.lines
self.labels_deltas = {}
+ self._present_prefixes = set()
def get_matching_blocks(self, lines):
"""Return an the ranges in lines which match self.lines.
@@ -216,7 +217,15 @@
range_start = 0
flush_range = self.flush_range
copy_ends = None
- blocks = self.get_matching_blocks(lines)
+ blocks = None
+ if len(key) > 1:
+ prefix = key[0]
+ if prefix not in self._present_prefixes:
+ self._present_prefixes.add(prefix)
+ # Mark this as not matching anything
+ blocks = [(0, len(lines), 0)]
+ if blocks is None:
+ blocks = self.get_matching_blocks(lines)
current_pos = 0
# We either copy a range (while there are reusable lines) or we
# insert new lines. To find reusable lines we traverse
More information about the bazaar-commits
mailing list