[RFC] bzr.jrydberg.versionedfile

Wed Dec 21 15:47:07 GMT 2005

John Arbash Meinel <john at arbash-meinel.com> writes:

>> Note to self (and others): This will increase both memory and I/O
>> overhead since the whole contents of the on-disk knit will have to be
>> read into memory before starting to fetch remote versions.
>
> In your proposal, you don't write out anything until everything has been
> committed, right? But you still have the problem that the texts need to
> be written before the inventory, and before the revision entry.

Here is a more fine grained outline of why the new fetcher does:

  1) Calculate a list of what revisions to fetch.
  2) Create a in-memory copy of the local 'revision' knit, and merge
     remote versions into the in-memory copy.
  3) Do the same as (2) for 'inventory' and 'changes' knit.
  4) Iterate over the pulled versions of the 'changes' file,
     and record them in a list.
  5) Iterate over the list, on per-file basis, and merge the versions
     directory to disk.
  6) Copy in-memory 'changes' and 'inventory' knit to disk (using .join)
  6) Copy in-memory 'revision' knit to disk (using .join)

Far from optimal, but uses the defined APIs.

> So why not just do that explicitly, like we do now. As long as you merge
> all of the texts for an inventory before you add the inventory entry,
> there is no problem if you pull in a couple extras ahead of time.
> And since knits already always keep there index in memory, you already
> have cached what knits have what revisions, so you wouldn't even need to
> reread the index file. (Though if you did, that it still much cheaper
> than rereading an entire knit/weave).
>
> So my proposal is the same as what I said for weaves...
>
> When fetching changes into the local branch (in preparation for merge,
> etc), do these steps:
>
> 	1) Grab a list of revisions
> 	2) Figure out the set of files involved. This is either done by
> 	   reading inventories, or with your delta object.
> 	3) For each file, either:
> 		a) Pull in only changes which match the list of
> 		   revisions you are expecting to fetch
> 		b) Pull in everything, because usually the waste
> 		   will be very small (usually none)
> 	4) Fetch the text of the inventory, and check all of the
> 	   associated texts, to make sure they have what you need
> 	5) Commit this inventory, then commit the revision
> 	6) Go back to 2 for the next inventory.

I guess the 'changes' knit could be merged directly to disk, since
that is more or less just a cache of two compared inventories, and is
not used for anything else than to speed up the fetcher.  And then you
iterate over the changes and collect file information.  File versions
can then be merged directly to disk as well.  When that is done, I see
no problem in merging the inventories directly to disk.  And finish of
by simply pull the revisions.

Do you see any problems with that?

One show stopper is my plan to incorporate the 'changes' information
into the 'revision' file.  Maybe it is just better to keep them
separated and think of the changes as a fetcher-cache (that never
needs to be invalidated.)

> [...]
>
> So far bzr has used the 'everything on disk is consistent, though some
> of it may be dead', and I think that is a fine model. I fear 'truncate'
> and 'delete'. Since there is no way to go back after it has been done.

Yes, you and Robert has convinced me.  Thanks :)

~j