[RFC] bzr.jrydberg.versionedfile

John Arbash Meinel john at arbash-meinel.com
Wed Dec 21 15:25:48 GMT 2005


Johan Rydberg wrote:
> Johan Rydberg <jrydberg at gnu.org> writes:
> 
> 
>>To follow the rules above, but still have acceptable performance, a
>>few tricks must be used.  For control knits (knits in group 1 and 2) a
>>in-memory knit is created, and merged with the on-disk knit.  The
>>in-memory knit is then also merged with the remote knit.  At this
>>point all entries needed is in memory, and if fetched in the most
>>effective way (KnitVersionedFile.join is more effective than
>>extracting texts).  When the time comes, the on-disk knit is merged
>>with the in-memory knit.  
> 
> 
> Note to self (and others): This will increase both memory and I/O
> overhead since the whole contents of the on-disk knit will have to be
> read into memory before starting to fetch remote versions.

In your proposal, you don't write out anything until everything has been
committed, right? But you still have the problem that the texts need to
be written before the inventory, and before the revision entry.

So why not just do that explicitly, like we do now. As long as you merge
all of the texts for an inventory before you add the inventory entry,
there is no problem if you pull in a couple extras ahead of time.
And since knits already always keep there index in memory, you already
have cached what knits have what revisions, so you wouldn't even need to
reread the index file. (Though if you did, that it still much cheaper
than rereading an entire knit/weave).

So my proposal is the same as what I said for weaves...

When fetching changes into the local branch (in preparation for merge,
etc), do these steps:

	1) Grab a list of revisions
	2) Figure out the set of files involved. This is either done by
	   reading inventories, or with your delta object.
	3) For each file, either:
		a) Pull in only changes which match the list of
		   revisions you are expecting to fetch
		b) Pull in everything, because usually the waste
		   will be very small (usually none)
	4) Fetch the text of the inventory, and check all of the
	   associated texts, to make sure they have what you need
	5) Commit this inventory, then commit the revision
	6) Go back to 2 for the next inventory.

The key is in item 3, when you first see that file "foo" was touched,
you pull either just everything you expect (because you already have the
list of revisions), or you just pull everything.
With this setup, the only things you need in memory are a list of the
revision ids you are going to add, the current inventory you are working
on, and the indexes for all of the knits.

I know you were thinking about rolling back transactions, but I really
don't think we want to worry about that. Because then you need a
transaction id, and a log to say what transaction ids have been
committed, etc.

So far bzr has used the 'everything on disk is consistent, though some
of it may be dead', and I think that is a fine model. I fear 'truncate'
and 'delete'. Since there is no way to go back after it has been done.

One of the desired properties of knits is that they don't modify old
history. So once it has been committed, it won't be modified, deleted,
etc. That was something I really didn't like about weaves (a bug in a
new version of bzr could destroy all of your history).

Anyway, I think you can tweak the KnitFetcher to perform well without
having to read everything into memory.

John
=:->

> 
> Some initial numbers:
> 
>   The unsafe method (old fetcher):
> 
>   real    2m30.801s
>   user    1m22.416s
>   sys     0m15.575s
> 
>   The safe method (new fetcher):
> 
>   real    5m36.910s
>   user    3m41.084s
>   sys     0m56.903s
> 
> Not good :/
> 
> ~j
> 
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051221/761ec0e0/attachment.pgp 


More information about the bazaar mailing list