Introduction to history deltas

Wed Dec 7 13:50:31 GMT 2005

Johan Rydberg <jrydberg at gnu.org> writes:

>> If the knits are packed together for each file, the *number* of fetches
>> actually *is* O(modified files). The knits are spread over the modified
>> files, so that's one fetch for each of them. And since the chunks are
>> ordered by date of addition, you usually want everything from some
>> offset till the end of file from each of them.
>
> I agree with you that in the end what you do is "fetch all missing
> deltas for knit-file K".  But the way you do it is by fetching delta
> by delta, based on information from the pulled inventory.  So it is a
> delta-based, and not file-based, pull.   (see fetch.py)

I've given this a bit more thought and it should be possible to write a
somewhat optimized fetcher for knits by simply collecting a set of all
the deltas needed to be fetched, and then try to fetch as large ranges
as possible.  So instead of doing;

 1. fetch revision
 2. fetch inventory
 3. fetch knit deltas
 4. goto 1 if not finished

You do;

 1. fetch revision
 2. fetch inventory
 3. collect knit deltas 
 4. goto 1 if not finished
 5. fetch all knit deltas in the fastest manner possible.

This could be implemented by letting Knit have a "join" method that
takes another knit, and a set of versions to pull.

Maybe another optimizations can be done to speed it up further. 

Another issue with knits is that the inventory knit will become rather
big, esp if it is shared between several branches (using archives).
Maybe the inventory and revision knits should be put in the branch
instead of in a shared "control store."

~j