xdelta multi-parent hack ?

John Arbash Meinel john at arbash-meinel.com
Mon Jun 25 18:12:59 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> Robert Collins wrote:
>> On Fri, 2007-06-22 at 12:47 -0500, John Arbash Meinel wrote:
>> I have another thought, which is that we could consider generating a
>> single recipe to reconstruct a revision rather than the recursive
>> recipes we're looking at today.
> 
>> E.g. (using a single parent example) rather than:
>> text1 + delta2 = text2
>> text2 + delta3 = text3
>> ...            = TextIRequested
> 
>> If we assume everything is stored as deltas, then we can reconstruct a
>> text by listing the delta elements that make it up:
>> rev1 bytes 40-50
>> rev2 bytes 20-70
>> rev3 bytes 200-201
> 

That doesn't work for secondary-compressed xdelta hunks. But it might be
interesting to take the FULL TEXT + [DELTAS] and see what sort of compression
that could give.

I would guess that for xdelta it wouldn't be very good, because of how the
algorithms work. (You don't end up with big sections of matching strings, and
xdelta explicitly excludes matches <= 2bytes long.)

For other things that tend to store longer "this hunk" stuff
(bdiff/patiencediff, etc).

xdelta has a very different way of storing its data. It has a section for 'add'
data, for 'instructions' and for 'copy' size/addresses.

Which may mean that genuine hunks that get added end up all sequential in the
'add' section. But I can also guess that doing something like "indent these 10
lines" shows up as 10 "add 4 spaces + copy this range" commands.

So, I think it is worth playing with, it could certainly be interesting. (Given
the fulltext + all deltas to this point, generate an xdelta of the current text.)

I'll see if I can work out the algorithm in my current framework, as it is
quite a bit different.


> That's a very interesting thought.  I designed MultiParent Diff
> specifically to avoid reconstructing intermediate versions.  But the
> line lookup, even though it doesn't do recursive function calls, is
> still a recursive algorithm.
> 
> You're essentially setting it up so that the build parents of each
> revision are all of the ancestors whose lines it contains.  Basically
> precalculated diff composition.
> 
> Aaron

I see it as slightly different (by my description above). You can make the
"PRE" text be all the text they would have had to read so far. For stuff like
bdiff/patience I would expect it to not be very good, because it would include
a lot of spurious "remove this text" sections corresponding to the
control-commands. Though I did find that multi-parent bdiff worked.

While xdelta works as just a "decompress this, given a certain amount of source
material". So it just ignores things that most diff algorithms would mark as
explicitly removed.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGf/eaJdeBCYSNAAMRApo8AJ96o7NqEMoih41z0tEZBMu4w5XpzwCgzMbG
mP5ZzA+E6gr1XJkvri8znuQ=
=lCDQ
-----END PGP SIGNATURE-----



More information about the bazaar mailing list