[MERGE] Add simple Storage object

Tue Jan 29 06:11:24 GMT 2008

On Mon, 2008-01-28 at 23:37 -0500, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> > I don't see a functional difference between just making everything in
> > the base Repository be defined in the primitives you mention, and what
> > you present.
> 
> I agree there's no functional difference between those two options.  I
> want to structure it this way because I think it has benefits for people
> developing the code
> 
> 1. It is very clear what a Storage object must implement.  It would not
> be nearly so clear what a Repository derivative must implement.

Well, using NotImplementedError & our repository interface tests, I have
to disagree with you about the difference in clarity here.

> 2. By grouping new, shiny methods (e.g. get_parent_map) together away
> from old, unloved methods (e.g. get_ancestry), it clarifies why there is
> overlap between methods-- that one set of methods is the public API and
> one set is the implementation API.

There is overlap because one method is old, with warnings about
scalability, and one is newer, and we're still evolving to get the right
balance (networks are hard, hmmm).

> 3. The implementation API is not a very friendly one, so it does not
> make sense to expose it on Repository.

The implementation API could be a set of _ methods.

> 4. The methods on Storage objects are public, but this does not require
> all repositories to implement such public methods.

Same as for 3.

> 5. It creates a boundary between core and non-core functionality.  This
> should encourage people to add methods to Repository and not Storage.
> This will reduce the growth of the Storage interface.

This is an interesting point; I wonder about performance implications.

> 6. Pack repositories have 81 public methods by my count (see below).
> This is large and intimidating, and it does not include
> Repository.weave_store or VersionedFile, both of which are arguably part
> of the Repository interface.  Adding even more methods would not be nice.

KnitPackRepository implements 22 methods however; which is pretty small
and minimal.

> 7. Smaller interfaces are easier to test, and therefore easier to test
> thoroughly.

This point I completely agree with.

I have a serious concern about adding Storage in the way you are doing
so, which is about getting the benefits of your point 7. My concern is
that having
Repository
 |
StorageRepository->Storage

we will still be allowing people to vary Repository in anyway they want
- its the current public subclassable interface, and the factory logic
for disk formats creates RepositoryFormat and Repository instances.
Secondly we have many different implementations of much of Repository,
*because* different storage idioms required different tradeoffs, and we
should consolidate that before we replace the varied implementations
with a common one (because the consolidation will give us the interface
we need, if not yet on a separate class).

But if Storage is meant to represent the core disk code, then really, we
should be having the factory Logic generate Storage and StorageFormat
instances, with Repository becoming a static class in bzrlib which is
parameterised by the Storage instance.

And crucially, to move to this design safely I have two things I think
are very important:
 - we should not be trying to build out a separate backend without
pivoting the disk format factory logic in a single (small) step. (this
prevents a bunch of potential skew happening)
 - we should not replace any currently overridable method with a non
overridable one until we have transformed all the implementations we
know of such that its guaranteed correct. I'm happy to dig up examples
of where this matters in bzr itself, but bzr-svn is pretty close to my
mind in this regard too.

I don't know that you are right about having a very small storage API.
My strong suspicion is that for optimisation we will continue to want a
rich interface that people override behaviours on specific to their disk
layout; and if they are doing that then Storage is dead weight rather
than a benefit. (Consider RemoteRepository here - unless Storage
is /really clever/, or asynchronous, RemoteRepository will still have to
be the interface, and we'll need all the same interface tests we have
today - a failure on your point 7 IMO).

But I'm willing to give it the benefit of the doubt. To migrate safely
in light of the two specific things I consider really important, I
suggest a series of patches:
An initial one that either:
(
 - renames Repository to Storage (in code and tests)
 - renames RepositoryFormat to StorageFormat (in code and tests)
 - creates a new Repository user class which decorates Storage, and
passes all the existing Repository methods through. (but this should not
be called Repository, because plugins subclass Repository, and this is
an API break)
)
or
(
 - creates a new user visible RepositoryHandle or similar class which
delegates everything to a Repository instance
 - renames Repository to _Repository (and likewise _RepositoryFormat) in
code and tests
 - creates a Repository subclass of Repository which has a deprecation
warning on its constructor (this provides api compatibility)
)

Followed up by as many small patches as makes sense (balancing
size/review time etc) that trim methods from the old heirarchy by making
the implementation on the user facing facade be in terms of the small
API you desire - the tests for the deleted methods can be removed, and
the new user facing facade can have tests about how it uses the backend
implementation. I think one patch per method would make the most sense:
it would keep the size of review very straight forward and allow the
greatest chance of finding a glitch. I'll be happy to help with this
process (because if we're going to do it, lets get it done).

> > But adding another abstraction to Repository, which is already having
> > problems - previous cleanups not completed - seems to me to be making
> > the api issues there worse, not better.
> 
> I am not sure what you're referring to.  Could you expand on that?

Each time we've added repository disk types, we've added new partial
abstractions to the family of types in Repository, but not removed the
old ones. RevisionStore for instance makes /no/ sense for Pack
repositories, but the users interface to Repository essentially exposes
RevisionStore sufficiently often that the cleanest way to implement the
pack repo was through an adapter to RevisionStore rather than just a
couple of methods.

We've basically got a number of 'Storage-like' objects already; and they
are insufficient to handle all the variation between Repository disk
format capabilities. This is why I'm sceptical about adding another
Storage interface being particularly better.

> Even if there are some partially-completed cleanups, I still think that
> it is worth it to try and clean up the API.  You know, if at first you
> don't suceed...

Agreed; but perhaps we should try something different? I'll sleep on
this and see if any good ideas about different ways to look at this area
of bzrlib crop up.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080129/c7da3c68/attachment.pgp