backward diffs in knits?

John Arbash Meinel john at arbash-meinel.com
Wed Apr 11 00:03:09 BST 2007


Aaron Bentley wrote:
> John Arbash Meinel wrote:
>> I think that in the common case of files growing, backwards deltas are
>> slightly smaller, and it optimizes for extracting the TIP revision,
>> which helps for things like "bzr checkout", "bzr update", and "bzr diff".
> 
> Well, diff is already handled by using the basis tree, right?

'status' is, but you still need the full texts to generate a diff.
Oh, and it would probably effect 'commit' too. Since you need to extract
the previous full text to generate the commit delta.


> 
>> When I was discussing this stuff with Martin and Robert, I didn't think
>> the idea was to actually store backwards deltas in Knits, but a
>> different storage format.
> 
> It sounds like they're now considering storing backwards deltas in Knits.
> 
>> I don't know how much compression one could get. But at least the Xorg
>> folks said that their packed history is approx the same size as a checkout.
> 
> Do they do in-tree builds?  That would skew the numbers a lot.
> 
>> Speaking of which, we might also consider using .7z as a storage format.
> 
> .7z has never struck me as a widely supported supported system, but if
> there are high-quality implementations available, I guess that's okay.
> Certainly, its other characteristics sound nice.
> 
> Aaron

Oh, it still has quite a bit of flux to deal with. Between multiple
incompatible Linux implementations of lzma (one group decided they
wanted to be able to stream, and the other group wanted to be compatible
with the Windows 7z format). I think there are some really good concepts
involved, and performance is quite nice. But the politics of 7z may be
enough to avoid it for now.

IIRC, the original '7zip' code is pretty tied to Win32. (I'm not sure
why you would tie a compression format to a platform rather than keeping
things separate, but maybe it is because they wanted to support
multi-threaded compression).

So it may just be that we take the ideas from there, rather than using
the specific format. (LZMA, "solid" archives with indexes, etc).

John
=:->



More information about the bazaar mailing list