repository/branch stacking - my current thoughts

Thu Feb 7 14:39:51 GMT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Our repository database allows for some history to be absent. These
> revisions are known as ghosts. This combined with deliberately causing
> this situation allows us to have extremely cheap 'standalone branches'.
> We could for instance upload a branch which is ~ the size of a bundle,
> but can be pulled into a repository with the revisions its based upon.
> 
> Such a branch has some problems though - it can't be used on its own,
> because it doesn't have enough content to recreate any commit in it (it
> only has the new deltas of that branch), nor can it be usefully logged,
> or used without a repository containing it's parent revision.

> So, one way to make it useful in these circumstances is to allow the
> branch to be combined on the fly by bzr with the locally missing
> content.

Another way is to store a very shallow history.  This would increase
storage requirements only slightly, while still allowing interoperation
with current branches.  I doubt the LCA of merges in Bazaar commonly
goes back even 100 revisions.

> Now all of these options have some implications:
>  - mutating operations on the branch/repository have to only affect that
> branch/repository (e.g. 'bzr reconcile' on one of these deliberately
> shallow things, needs to not error or fail or even analyse the external
> references.) It also needs to /preserve/ the external references in file
> graphs etc, 
>  - readonly operations such as check really shouldn't access more data
> outside the actual location being examined than necessary.
>  - data access that crosses repository boundaries will require some care
>  - we likely need some way to bring more data into a shallow environment
> deliberately.
>  - I think a composite structure like this should be all-or-nothing -
> missing references are errors, not soft errors.  

Another significant consequence is that repositories do not know what
other branches are using their data, and therefore do not know which
revisions are safe to delete.  This goes against our plans for garbage
collection, nuclear launch codes, etc.  One solution would be for
repositories to have a flag indicating whether they permit stacking, and
this would prevent garbage collection.

> In terms of code layout, I think that delegation to other repositories
> should be a core component of the Repository class; I don't think we
> should attempt to do this via layering a decorator on the top, because
> Repository is really a URL located resource in many places in our code
> base, and a decorator doing magic has no URL of it's own.

This is a pretty good argument against doing it as a decorator on
Repositories, but it doesn't consider other alternatives.  Repositories
do not have responsibility for retrieving data-- this is the function of
stores.

- - doing a decorator on Stores would make sense
- - the Storage idea I've proposed explicitly suggests StackedStorage as a
  Storage implementation.
- - knits were also originally designed to support stacking.
- - Martin has suggested doing stacking in terms of packs.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHqxg30F+nu1YWqI0RAtk/AJ9/aY9YhxG+9rhMoFDMHclESJb91QCggH7T
Vi2i0DKYDuxhIO3HG2ncDFQ=
=zyT/
-----END PGP SIGNATURE-----