[RFC] bzr.jrydberg.versionedfile

Wed Dec 21 16:09:56 GMT 2005

Johan Rydberg wrote:
> John Arbash Meinel <john at arbash-meinel.com> writes:
> 
> 
>>>Note to self (and others): This will increase both memory and I/O
>>>overhead since the whole contents of the on-disk knit will have to be
>>>read into memory before starting to fetch remote versions.
>>
>>In your proposal, you don't write out anything until everything has been
>>committed, right? But you still have the problem that the texts need to
>>be written before the inventory, and before the revision entry.
> 
> 
> Here is a more fine grained outline of why the new fetcher does:
> 
>   1) Calculate a list of what revisions to fetch.
>   2) Create a in-memory copy of the local 'revision' knit, and merge
>      remote versions into the in-memory copy.
>   3) Do the same as (2) for 'inventory' and 'changes' knit.
>   4) Iterate over the pulled versions of the 'changes' file,
>      and record them in a list.
>   5) Iterate over the list, on per-file basis, and merge the versions
>      directory to disk.
>   6) Copy in-memory 'changes' and 'inventory' knit to disk (using .join)
>   6) Copy in-memory 'revision' knit to disk (using .join)
> 
> Far from optimal, but uses the defined APIs.

What would you consider optimal, and how different would it be to get us
there? I don't think we are stuck on any specific API, we won't reached
'stable' until February. :) Far better to do the right thing now, then
be hackish.

> 
> 
>>So why not just do that explicitly, like we do now. As long as you merge
>>all of the texts for an inventory before you add the inventory entry,
>>there is no problem if you pull in a couple extras ahead of time.
>>And since knits already always keep there index in memory, you already
>>have cached what knits have what revisions, so you wouldn't even need to
>>reread the index file. (Though if you did, that it still much cheaper
>>than rereading an entire knit/weave).
>>
>>So my proposal is the same as what I said for weaves...
>>
>>When fetching changes into the local branch (in preparation for merge,
>>etc), do these steps:
>>
>>	1) Grab a list of revisions
>>	2) Figure out the set of files involved. This is either done by
>>	   reading inventories, or with your delta object.
>>	3) For each file, either:
>>		a) Pull in only changes which match the list of
>>		   revisions you are expecting to fetch
>>		b) Pull in everything, because usually the waste
>>		   will be very small (usually none)
>>	4) Fetch the text of the inventory, and check all of the
>>	   associated texts, to make sure they have what you need
>>	5) Commit this inventory, then commit the revision
>>	6) Go back to 2 for the next inventory.
> 
> 
> I guess the 'changes' knit could be merged directly to disk, since
> that is more or less just a cache of two compared inventories, and is
> not used for anything else than to speed up the fetcher.  And then you
> iterate over the changes and collect file information.  File versions
> can then be merged directly to disk as well.  When that is done, I see
> no problem in merging the inventories directly to disk.  And finish of
> by simply pull the revisions.
> 
> Do you see any problems with that?

I don't see any specific problems. I think it is pretty much what I was
suggesting. You can do everything direct to disk, you just have to do it
in the right order. Changes can include revisions which aren't fully
added yet, since it isn't part of the contract.

> 
> One show stopper is my plan to incorporate the 'changes' information
> into the 'revision' file.  Maybe it is just better to keep them
> separated and think of the changes as a fetcher-cache (that never
> needs to be invalidated.)

Yes, I would keep them separate. Having a "revision" indicates that you
have all of its changes. We could define it differently, but we want
something that indicates that.

One alternative, would be to have a WAL of sorts, which is just a list
of revision-ids which have been committed to the store. So the
transaction id then becomes the revision id (which is really what we are
doing right now, we are just using the revision-store as the WAL).

Then someone could just read the 'revision-id' list, when they grab the
branch (right now they use revision-history). revision-history doesn't
contain merged revisions, so it can't be used as a list of valid entries
in the store. By creating an explicit list of "these are the valid
revisions in this store", a single file could be read. And then you
would know for all texts, and all inventories what were the valid entries.

I'm not advocating this heavily, but it might be something to think about.

> 
> 
>>[...]
>>
>>So far bzr has used the 'everything on disk is consistent, though some
>>of it may be dead', and I think that is a fine model. I fear 'truncate'
>>and 'delete'. Since there is no way to go back after it has been done.
> 
> 
> Yes, you and Robert has convinced me.  Thanks :)
> 
> ~j
> 

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051221/74352c96/attachment.pgp