[RFC] killing versioned file.join
Robert Collins
robertc at robertcollins.net
Wed Apr 9 03:44:33 BST 2008
So this is a little rambly... I've chatted with poolie and spiv about
this, and this is my attempt to bring all the threads together.
I want to remove VersionedFile.join. It was useful when it was
introduced because of the way fetch was structured. but we really want a
streaming format, which we have now via Repository.get_data_stream.
So, to remove VersionedFile.join, the fetch module needs to start using
Repository.get_data_stream and insert_data_stream.
This is complicated by the need to upcast data when converting between
repositories formats, and ordering in some cases.
On a local fetch operation we can examine both sides and run special
code, but when fetching from a smart server its more complex, because
with either a newer or older client, we cannot guarantee that any
description of a repository will be understandable.
What ways do repositories vary today:
* serialisation of metadata (xml4/5/6/7/8/journalled-inv/...)
* model of metadata (plain, rich-root, subtrees)
* atomic-insertion, or require texts,inventories,signatures,revisions.
An in future:
* delta logic
To eliminate join locally I need to handle only what we do today; but we
should have something relatively compatible with the future planned
changes.
Now, we can ignore differing serialisations for now - they fall back to
the full tree api.
Model changes go through the plain fetch code path, and so does
knit->pack fetching, as well as knit->knit.
For best performance and memory use..
For same model fetches, we want the following:
knit->knit:
* all text knit hunks, per versionedfile, topological order
* ditto inventory
* ditto signatures
* ditto revisions
knit->pack:
* as per knit->knit works fine
pack->knit:
* as per knit->knit
pack->pack (occurs when fetching from a RemoteRepository only):
* hunks from each pack, in forward-read IO order
For different model fetches, we want the following:
knit->knit:
* all text knit hunks, per versionedfile, topological order
* the inventories needed to iterate the revision trees of
the revisions being fetched: this means we need the basis
inventory text, and then the knit hunks.
* the signature knit hunks in topological order
* the revision knit hunks in topological order
knit->pack:
* same as knit-> knit
pack->knit:
* same as knit-> knit
pack->pack:
* same inventory details as knit->knit
* all text hunks, in optimal IO order
* signature and revision hunks
So it seems to me that today, we simply need to control two things in
getting a data stream to eliminate join() from fetch:
- whether we supply data in read-optimal order or non-atomic-insert
order
- whether we supply enough data to reconstruct all inventories, or
not
So -
repository.get_data_stream_for_search(search, data_order,
complete_inventory)
data_order in ("read-optimal", "nonatomic-insert")
complete_inventory in (False, True)
is my proposed replacement API.
Thoughts? I plan to hack on this now, so an uncrafted reply now is
better than a crafted one tomorrow.
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080409/a3899af1/attachment.pgp
More information about the bazaar
mailing list