[RFC] bzr.jrydberg.versionedfile

Wed Dec 21 18:50:26 GMT 2005

Johan Rydberg wrote:
> John Arbash Meinel <john at arbash-meinel.com> writes:
> 
> 
>>>>I don't see any specific problems. I think it is pretty much what I was
>>>>suggesting. You can do everything direct to disk, you just have to do it
>>>>in the right order. Changes can include revisions which aren't fully
>>>>added yet, since it isn't part of the contract.
>>>
>>>
>>>Yes, with the exception of steps 5 and 6.
>>
>>I'm not sure what you mean here. Let me provide my revised version of
>>your set of steps:
>>
>>  1) Calculate a list of what revisions to fetch.
>>  2) Create a in-memory copy of the local 'revision' knit, and merge
>>     remote versions into the in-memory copy.
>>  3) Merge the 'changes' knit directly to disk (.join)
>>  4) Iterate over the pulled versions of the 'changes' file,
>>     and record them in a list.
>>  5) Iterate over the list, on per-file basis, and merge the versions
>>     directory to disk.
>>  6) Merge the 'inventory' knit directly to disk (.join)
>>  7) Copy in-memory 'revision' knit to disk (using .join)
>>
>>The only step I might add is a 5b), which before merging an inventory to
>>disk would actually extract the full text, generate the in-memory
>>representation, and verify that all referenced files have all of the
>>referenced revisions. This could be more of an integrity check stage,
>>which we could be optional (and in the future removed).
>>We would need a code path for this sort of thing anyway, in the case
>>that a remote 'changes' did not exist. Or is 'changes' going to be a
>>required file.
> 
> 
> Not exactly as I do it.  I never copy the 'revision' knit into a
> in-memory version, since that is not needed.  Instead I simply merge
> the remote versions into the on-disk knit as the last step in the
> fetching.  I.e.:
> 
>   1.  merge 'changes' to on-disk knit
>   2.  collect and merge file versions
>   3.  merge 'inventory' to on-disk knit
>   4.  merge 'revision' to on-disk init
> 
> And yes, 'changes' is going to be a required file.  If it is not there
> (for some absurd reason), you could always fall back on the slow
> inter-format fetcher.
> 

Now, changes is a 'repository' file, right? Because it is required based
on the storage, not based on the branch or working directory.

> 
>>We have this for knits, in the 'knit index' file. I assume you only
>>write an entry to the index if you completed the write to the knit. So
>>if I write half of an entry, and then get canceled, that chunk is
>>implicitly marked as bad, because there is no index entry which
>>references it.
> 
> 
> That is correct.
> 
> 
>>>I had an idea some time ago of having a 'revision-graph' file in .bzr
>>>that contains (revision-id, parents) tuples of all revisions available
>>>in the revision-store.  I think that using such a file is cleaner
>>>design wise, than to rely on the index of the inventory or revision
>>>knits to extract ancestry and graph information about the branch --
>>>esp in the case where the inventory and revision knits are shared
>>>between several branches.
> 
> 
>>Well, you are adding more information into the file, which isn't
>>terrible. Just extra.
>>The question is why have that extra file, if we already have the
>>information in the index? Isn't it redundant, with potential to
>>disagree? (breaking the idea of normalization)
> 
> 
> My plan was that the revision-graph file should be private to the
> branch, where as the revision index is shared between all branches
> that uses the same repository.

If you are private to a branch, then revision-graph would only include
the ancestry of that branch, right? It is subtly, but importantly
different than the revision-graph for the repository.
One is stating what revisions you have access to, and the other is
stating what revisions have been merged into this branch and are now
part of it's ancestry.

> 
> I'm not sure it is a win, or even a need for it, and you are right
> that we might end up in situations where there can be disagreements
> between the index and the graph-file.
> 
> ~j

Well, even with the above constraint that it is only about the actual
branch, I don't know if you gain much. If the branch keeps track of "I
am this revision", then any complete revision graph can give you the
full ancestry for that revision.
And if we change so that 'revision-history' is always the first parent
along the history (so converging two branches actually means that one of
them 'wins'), then a branch is defined purely by a single revision-id.
All of its history is encapsulated in that single identifier. (tags may
not be, though).

In general, I think revision-graph is a non-starter. Simply because the
information is already available elsewhere.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051221/af496a2b/attachment.pgp