backward diffs in knits?

John Arbash Meinel john at arbash-meinel.com
Wed Apr 11 15:33:04 BST 2007


Robert Collins wrote:
> On Tue, 2007-04-10 at 09:54 -0500, John Arbash Meinel wrote:
>>
>> I think that in the common case of files growing, backwards deltas are
>> slightly smaller, and it optimizes for extracting the TIP revision,
>> which helps for things like "bzr checkout", "bzr update", and "bzr
>> diff".
>>
>>
>> When I was discussing this stuff with Martin and Robert, I didn't
>> think
>> the idea was to actually store backwards deltas in Knits, but a
>> different storage format.
> 
> Well there are several discussions going on concurrently with no
> particular urgency:
>  - a long term wishful-thinking idea for lockless repositories to
> increase concurrency. The current thinking on this is that of large blob
> files with built in indexes and potentially heavily optimised for size -
> so deltas could be out-of-ancestry-but-within-the-blob etc.
>  - continued incremental improvements to knits - can we get more
> performance without changing the disk format.
>  - things like aarons proposal for multi-parent deltas,  or this thread
> on backwards diffs: things that we can do by changing the knit format
> but dont require radical changes to the system or have long dependency
> chains that would require significant work.
> 
>> I don't think backwards deltas should be generated "on-the-fly",
>> because
>> of the potential for a new corruption to lose old data.
> 
> I'm not sure what you mean here, in particular you can always perform a
> check of all data after the pull, if and only if we have altered
> representation.
> 
>> However, it could make sense for an "archive" command. Where you
>> packaged up a bunch of older revisions to make them more dense, and
>> sort
>> of put them off to the side. They should still be accessible, but
>> maybe
>> they wouldn't be in the default search every time.
> 
> I see this as a variation on 'pack' - is that what you were thinking of?
> 
> 
>> And I can say that bzr.dev is 54MB for .bzr/ and 11MB for working
>> files
>> (including .pyc, etc). So I'm curious if we can do better than 5x.
> 
> How much is inventory ? :).
> 
> 
> -Rob

1.4MB for .kndx and 17MB for .knit.

But I also found that this branch is improperly packed (all the index
files need repacking, etc.)

So let me fix these numbers...

After rebuilding inventory.knit it dropped from 17MB to 16MB (our
default packing is actually quite good for bzr.dev)

But inventory.kndx dropped from 1.4MB => 900K.

After repacking all kndx files .bzr dropped to 52MB. So now we are at
52:11. Still about 5:1.

John
=:->




More information about the bazaar mailing list