repository/branch stacking - my current thoughts
Robert Collins
robertc at robertcollins.net
Tue Feb 5 05:59:00 GMT 2008
So theres an interesting facility we have planned but not delivered.
Our repository database allows for some history to be absent. These
revisions are known as ghosts. This combined with deliberately causing
this situation allows us to have extremely cheap 'standalone branches'.
We could for instance upload a branch which is ~ the size of a bundle,
but can be pulled into a repository with the revisions its based upon.
Such a branch has some problems though - it can't be used on its own,
because it doesn't have enough content to recreate any commit in it (it
only has the new deltas of that branch), nor can it be usefully logged,
or used without a repository containing it's parent revision.
So, one way to make it useful in these circumstances is to allow the
branch to be combined on the fly by bzr with the locally missing
content.
There are a few ways this can be done; I think all of them have to be
transitive (if you refer to something that is itself partial, the
reference chains).
One way is to extend the repository with external references. That is,
have a set somewhere in the repository that says:
(revisions, a-revision_id) -> LOCATION
(inventory, a-revision_id) -> LOCATION
(texts, a-file_id, a-revision_id) -> LOCATION
etc
This has a problem in that the list can be pretty big. Consider a
mozilla tree - 50K references. Now a branch with a delta to one file has
to refer to 50K-1 external references; so the external reference
management will have to be pretty darn slick.
It also doesn't support random access to references that are not listed
without going to (all things being equal) 1/2 of the listed reference
locations and checking their indices.
Another way is to simply provide a list of locations for the repository
- this scales with places-to-check, not references-made-outside, so is
simpler to manage. This still suffers access costs of 1/2 the listed
locations but is conceptually simpler to approach.
A third way is to provide a location in the branch (or arguably a list,
but I think there is little reason to have an arbitrary list, and a
single pointer avoid things like 'which one to check first', and 'handle
some network issues as soft errors rather than hard errors').
Now all of these options have some implications:
- mutating operations on the branch/repository have to only affect that
branch/repository (e.g. 'bzr reconcile' on one of these deliberately
shallow things, needs to not error or fail or even analyse the external
references.) It also needs to /preserve/ the external references in file
graphs etc,
- readonly operations such as check really shouldn't access more data
outside the actual location being examined than necessary.
- data access that crosses repository boundaries will require some care
- we likely need some way to bring more data into a shallow environment
deliberately.
- I think a composite structure like this should be all-or-nothing -
missing references are errors, not soft errors.
Now we have some other work being developed that will work well with
this - bringing in other delta formats will likely make working across
repository boundaries easier - we can ask for all the file-versions that
a given repository needs from externally referenced repositories.
In terms of code layout, I think that delegation to other repositories
should be a core component of the Repository class; I don't think we
should attempt to do this via layering a decorator on the top, because
Repository is really a URL located resource in many places in our code
base, and a decorator doing magic has no URL of it's own.
Feedback/missed angles/discussion solicited.
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080205/b9c2afd8/attachment.pgp
More information about the bazaar
mailing list