Rev 58: Allowing the source bytes to be longer than expected. in http://bzr.arbash-meinel.com/plugins/groupcompress_rabin

John Arbash Meinel john at arbash-meinel.com
Fri Feb 27 20:18:49 GMT 2009


At http://bzr.arbash-meinel.com/plugins/groupcompress_rabin

------------------------------------------------------------
revno: 58
revision-id: john at arbash-meinel.com-20090227201847-181ruulj0worz3ra
parent: john at arbash-meinel.com-20090227195427-5rw3pjlgkssido0d
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 14:18:47 -0600
message:
  Allowing the source bytes to be longer than expected.
  This makes a huge difference for extraction speed.
  10s versus 45s. Versus 17s for the original groupcompress code.
  
  
  Also, the compiled version in _groupcompress_c seems ~ the same speed as
  the patch-delta.c version.
  At the very least, the extra memory copy overhead negates any benefit.
-------------- next part --------------
=== modified file '_groupcompress_c.pyx'
--- a/_groupcompress_c.pyx	2009-02-27 18:21:04 +0000
+++ b/_groupcompress_c.pyx	2009-02-27 20:18:47 +0000
@@ -144,9 +144,10 @@
     # make sure the orig file size matches what we expect
     # XXX: gcc warns because data isn't defined as 'const'
     size = get_delta_hdr_size(&data, top)
-    if (size != source_size):
+    if (size > source_size):
         # XXX: mismatched source size
         return None
+    source_size = size
 
     # now the result size
     size = get_delta_hdr_size(&data, top)

=== modified file 'groupcompress.py'
--- a/groupcompress.py	2009-02-27 19:54:27 +0000
+++ b/groupcompress.py	2009-02-27 20:18:47 +0000
@@ -475,8 +475,11 @@
                 else:
                     # TODO: relax apply_delta so that it can allow source to be
                     #       longer than expected
-                    chunks = [_groupcompress_c.apply_delta(
-                                plain[0:index_memo[3]], delta)]
+                    bytes = _groupcompress_c.apply_delta(plain, delta)
+                    if bytes is None:
+                        import pdb; pdb.set_trace()
+                    chunks = [bytes]
+                    del bytes
                 if sha_strings(chunks) != sha1:
                     raise AssertionError('sha1 sum did not match')
             yield ChunkedContentFactory(key, parents, sha1, chunks)

=== modified file 'patch-delta.c'
--- a/patch-delta.c	2009-02-27 17:32:04 +0000
+++ b/patch-delta.c	2009-02-27 20:18:47 +0000
@@ -27,8 +27,11 @@
 
 	/* make sure the orig file size matches what we expect */
 	size = get_delta_hdr_size(&data, top);
-	if (size != src_size)
+	/* MOD: We allow a bigger source, assuming we only compressed
+	   against the first bytes. */
+	if (size > src_size)
 		return NULL;
+	src_size = size;
 
 	/* now the result size */
 	size = get_delta_hdr_size(&data, top);



More information about the bazaar-commits mailing list