[RFC] Cheap re-pack?

Fri Sep 28 17:47:47 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> Robert Collins wrote:
>> On Thu, 2007-09-06 at 19:55 -0400, Aaron Bentley wrote:
>>> I knew they worked like that initially, but I thought that was just a
>>> temporary first step.  Are we planning to release them this way?
>> I'd like to do *a* release of packs in 0.92. It doesn't have to be the
>> default, but its so much better that I don't see any reason not to
>> release a version as an incremental improvement.
> 
> Okay, everything makes much more sense in that context.  I certainly
> don't want "the perfect" to be the the enemy of "the good", I just
> didn't know your current thoughts.  When/if we start committing
> fulltexts, we can look at my suggestion then.
> 
>>  - Faster commit by better delta logic - e.g. mpdiff/xdelta/blob rather
>> than lines ??? [still not tied down to a decision]
> 
> John, what's your current thinking on xdelta?  I'd like to replace knits
> with *something*, because I think the knit delta format has some
> defects.  I'd be happy to do mpdiff, and I'd be curious to see whether
> the C patience matcher levels the playing field in terms of speed.  But
> I'd also rather xdelta than knit.
> 
> Aaron

Xdelta creates really small, compact deltas. It does so quickly. But it does it
in such a way that it is really tricky to derive any sort of annotation
information. (Since it is a form of compression, the hunks may come from the
same file, etc.)

So if you wanted to send a pack stream over the wire, xdelta is a pretty good
way to do it. (It also handles large binary files very well, since it doesn't
care at all about line endings.)

You can simulate "mpdiff" by just concatenating the parent texts. And it seemed
rather decent at that.

It was a bit slower at extracting texts.

After analyzing it again, I'm going to say it is still looking pretty good.

The C version of Patience diff seems to make a huge difference. In my
simplistic benchmarks doing a diff of 2 texts (2 versions of builtins.py) was
taking 70ms for Patience, 2-4ms for xdelta3 (depending on settings), and 4ms
for bdiff. With Patience in C, it now takes ~5ms.

For 100 entries of builtins.py we get something like:
test_compress_100(lh_child_child,xd3-default)
    13124094 => 74532 ( 1 full: 47628, 99 delta: 26904)
    276ms
test_compress_100(lh_child_child,xd3-djw)
    13124094 => 66728 ( 1 full: 40962, 99 delta: 25766)
    297ms
test_compress_100(lh_child_child,xd3-NOCOMPRESS+zlib)
    13124094 => 66062 ( 1 full: 37897, 99 delta: 28165)
    225ms

test_compress_100(lh_child_child,bdiff-one+zlib)
    13124094 => 91063 ( 1 full: 36672, 99 delta: 54391)
    408ms

test_compress_100(lh_child_child,patience+zlib)
    13124094 => 87439 ( 1 full: 36672, 99 delta: 50767)
    622ms
test_compress_100(lh_child_child,patience+bzip2)
    13124094 => 82341 ( 1 full: 29446, 99 delta: 52895)
    690ms

bzip2 and zlib do a better compression job on a single text than xdelta does,
though not overwhelmingly so (29.4k versus 47.6k). xdelta does a lot better on
the actual deltas.
Xdelta is certainly the fastest to generate (225ms for xd3-NOCOMPRESS + zlib),
and also the smallest overall size.

For extraction times, Patience is actually pretty fast (I think it is actually
faster than Xdelta, slightly slower than mpatch).
Given the above compression, these are the times for extracting the first text.
Which has a linear patch chain of 16.

test_100_extract_0(lh_child_child,xd3-default)
    138272 => 47628 +  2315 (16) = 49943
    8ms
test_100_extract_0(lh_child_child,xd3-djw)
    138272 => 40962 +  2293 (16) = 43255
    16ms
test_100_extract_0(lh_child_child,xd3-NOCOMPRESS+zlib)
    138272 => 37897 +  2519 (16) = 40416
    8ms

test_100_extract_0(lh_child_child,bdiff-one+zlib)
    138272 => 36672 +  4608 (16) = 41280
    4ms
test_100_extract_0(lh_child_child,bdiff-multi+zlib)
    138272 => 36672 +  4608 (16) = 41280
    2ms

test_100_extract_0(lh_child_child,patience+zlib)
    138272 => 36672 +  4327 (16) = 40999
    6ms
test_100_extract_0(lh_child_child,patience+bzip2)
    138272 => 29446 +  4840 (16) = 34286
    20ms

bzip2 starts showing how slow it is, as is xd3-djw. (Which is using the DJW
"secondary compression"). mpatch is hands down the winner (passing the 16
deltas to be patched onto the fulltext in one call). But patience isn't
horribly behind (and is still faster than xdelta3 in all forms).

Also, note that applying patience patches is still in Python (because it is
just messing with python lists, which is pretty fast). We could certainly
rewrite that in C for a bit better performance. (Not a lot, but slightly since
you have to interpret the delta chunks, and you could do patch combining,
rather than multiple updates of a large list.)

I don't think I have multi-parent hooked up, as it requires a bit of change to
my benchmark logic. But I remeber doing some benchmarks where using both
parents did better than left-hand child or child, or left-hand parent as far as
total compression size, which leaves you with forward deltas. However, lh_child
does really well if we want backwards deltas. (Since in theory you extract
newer texts more often than older texts.)

Something weird happens when we switch over to all 1300 builtins.py texts, though.
test_compress(lh_child_child,xd3-default)
    120,200,294 =>   296,641 (1 full=47,628, 1288 delta=249,013)
    compress: 2141ms
    decompress: 444ms
test_compress(lh_child_child,xd3-djw)
    120,200,294 =>   280,685 (1 full=40,962, 1288 delta=239,723)
    compress: 2261ms
    decompress: 467ms
test_compress(lh_child_child,xd3-NOCOMPRESS+zlib)
    120,200,294 =>   299,714 (1 full=37,897, 1288 delta=261,817)
    compress: 1943ms
    decompress: 471ms

test_compress(lh_child_child,bdiff-one+zlib)
    120,200,294 =>   556,981 (1 full=36,672, 1288 delta=520,309)
    compress: 3546ms
    decompress: 268ms

test_compress(lh_child_child,patience+zlib)
    120,200,294 =>   520,542 (1 full=36,672, 1288 delta=483,870)
    compress: 4970ms
    decompress: 1785ms
test_compress(lh_child_child,patience+bzip2)
    120,200,294 =>   544,167 (1 full=29,446, 1288 delta=514,721)
    compress: 5478ms
    decompress: 2004ms

compress is the time to compress each texts against 1 base (first left-hand
child, or any child if no left-hand). decompress is the time to decompress each
text relative to its base (as a full text).

Here again, xdelta3 is very fast at compression, and compresses very well. But
it is slower ad decompression than bdiff/mpatch. For some reason, patience
performance really bottoms out.

Overall, if we are changing how we are doing annotations anyway, I think
xdelta3 is a pretty good choice. It compresses quickly, and efficiently. It is
a little slower at extraction, but it depends on how many times you do X versus
Y. Because compression is slower, If you compress 1 time, you save 1.4s. And
you can extract 8 times before you use up your saved time (1.4s / .176s =~ 7.95).

I think the extraction time is still in the "very good" scale, and the savings
in compression time, and total compressed size make up for the rest. The
biggest problem is that we can't use it for annotations, but as packs are
moving away from that anyway...

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG/TAzJdeBCYSNAAMRAlU3AKCVVQlzKf/a+BH2cQxwC94CXw0mcQCgolHG
76y7VXj6LETLtUlRf2hTZRU=
=m0Cj
-----END PGP SIGNATURE-----