[RFC] bzr.jrydberg.versionedfile

Wed Dec 21 17:16:14 GMT 2005

John Arbash Meinel <john at arbash-meinel.com> writes:

>>>I don't see any specific problems. I think it is pretty much what I was
>>>suggesting. You can do everything direct to disk, you just have to do it
>>>in the right order. Changes can include revisions which aren't fully
>>>added yet, since it isn't part of the contract.
>> 
>> 
>> Yes, with the exception of steps 5 and 6.
>
> I'm not sure what you mean here. Let me provide my revised version of
> your set of steps:
>
>   1) Calculate a list of what revisions to fetch.
>   2) Create a in-memory copy of the local 'revision' knit, and merge
>      remote versions into the in-memory copy.
>   3) Merge the 'changes' knit directly to disk (.join)
>   4) Iterate over the pulled versions of the 'changes' file,
>      and record them in a list.
>   5) Iterate over the list, on per-file basis, and merge the versions
>      directory to disk.
>   6) Merge the 'inventory' knit directly to disk (.join)
>   7) Copy in-memory 'revision' knit to disk (using .join)
>
> The only step I might add is a 5b), which before merging an inventory to
> disk would actually extract the full text, generate the in-memory
> representation, and verify that all referenced files have all of the
> referenced revisions. This could be more of an integrity check stage,
> which we could be optional (and in the future removed).
> We would need a code path for this sort of thing anyway, in the case
> that a remote 'changes' did not exist. Or is 'changes' going to be a
> required file.

Not exactly as I do it.  I never copy the 'revision' knit into a
in-memory version, since that is not needed.  Instead I simply merge
the remote versions into the on-disk knit as the last step in the
fetching.  I.e.:

  1.  merge 'changes' to on-disk knit
  2.  collect and merge file versions
  3.  merge 'inventory' to on-disk knit
  4.  merge 'revision' to on-disk init

And yes, 'changes' is going to be a required file.  If it is not there
(for some absurd reason), you could always fall back on the slow
inter-format fetcher.

> We have this for knits, in the 'knit index' file. I assume you only
> write an entry to the index if you completed the write to the knit. So
> if I write half of an entry, and then get canceled, that chunk is
> implicitly marked as bad, because there is no index entry which
> references it.

That is correct.

>> I had an idea some time ago of having a 'revision-graph' file in .bzr
>> that contains (revision-id, parents) tuples of all revisions available
>> in the revision-store.  I think that using such a file is cleaner
>> design wise, than to rely on the index of the inventory or revision
>> knits to extract ancestry and graph information about the branch --
>> esp in the case where the inventory and revision knits are shared
>> between several branches.

> Well, you are adding more information into the file, which isn't
> terrible. Just extra.
> The question is why have that extra file, if we already have the
> information in the index? Isn't it redundant, with potential to
> disagree? (breaking the idea of normalization)

My plan was that the revision-graph file should be private to the
branch, where as the revision index is shared between all branches
that uses the same repository.

I'm not sure it is a win, or even a need for it, and you are right
that we might end up in situations where there can be disagreements
between the index and the graph-file.

~j