[MERGE] Add simple Storage object
aaron at aaronbentley.com
Sat Feb 9 08:03:03 GMT 2008
-----BEGIN PGP SIGNED MESSAGE-----
Robert Collins wrote:
> I agree there is a difference in clarity here. But - pack and knits
> don't have the same more for storage any more than weaves and knits;
> certainly things like svn and hg don't have much in common at all.
SVN is definitely not a candidate for a Storage-based implementation.
It can't even do Repository.iter_files_bytes efficiently, so AFAICT,
it's a basket case and it's amazing it works as well as it does.
> I'm not saying Storage has no value, but I am saying much of the
> problems with Repository are /not/ the lack of Storage, but just
> Repository not getting enough clean-me-up-scotty love.
I think that's basically orthagonal. Repositories could be implemented
entirely on top of Storage, and still provide the same grotty interface
as they do now. Logically, they could also be implemented via the
template method, and still provide this grotty interface.
>> I would really appreciate critical analysis in this area. If there's
>> anything about this interface that's too limiting, it would be really
>> helpful to know in advance.
> Latency - the smart server is our biggest brightest hope for really
> extreme performance over the network; and primitive methods which result
> in many api calls locally become dramatically more expensive over the
> smart server (even when they may not be so expensive over sftp because
> of object based caching).
Nothing says that the Storage API must be transmitted over the network
verbatim. For example, I would imagine that when we call
get_parents_map(), it would retrieve more data than was actually
requested, cache it locally, and satisfy it from cache later.
Retrieving 10 ancestors instead of 1 could cut round-trips to a tenth.
But get_build_parents seems uselessly generic to me now that I've
started to implement the acceleration. I now think a method that
determines all the build-dependencies of a given fulltext would be best.
> The vast bulk of the code in bzrlib is dealing with a basically coherent model.
> Until we have one of:
> in the core library, I don't think we'll really have a handle on the
> amount of variation required to support with high performance
> significant variation in the core; and I think we can and should do
> that. I found while tuning commit that we still have a lot of stuff on
> Repository that is basically noise for modern repositories.
I think our biggest concern is supporting our own repositories really
well. If svn or hg or git can't support Storage efficiently, they can
provide an implementation of Repository that is tuned to their
journalled inventories are interesting in that their "delta format" is
useful in addition to their fulltext format. But we can retrieve their
delta form through Storage.iter_raw_items and their fulltext form
It seems plausible that path-tokens might need a new kind of data to be
recorded, but that could be handled by opening up a new namespace for
such data, and then accessing it via iter_byte_streams or iter_raw_items.
Of course, we might also need new APIs. It's hard to know at this
point, but I suspect that Storage will serve us well even if we need to
extend it to support them.
> Saying that 'Both Pack1 and Knit1 are implemented by subclasses of
> Storage with a StorageRepository' does allow us to move some of the
> permitations to Storage* tests; but it doesn't actually make it easier
> for other repository implementors.
It certainly does make it easier for other repository implementors, but
not for all of them. I'm thinking Pack format 2, Remote repository,
database/OLPC repository, Stacked Repository, etc.
> I'd like to see the functionality extracted into helper methods on
> Repository, and done across all formats so that the top level Repository
> public methods start to be solid and sane - they are currently very
> weave, or even just a bunch of files centric.
The problem with Weave or pre-Weave formats is that iter_raw_items
doesn't make much sense for them. But it does make sense for
delta-based formats, and that's where I think we should focus.
> I don't think that particular combination can be done in the immediate
> term, but getting *a* minimal interface (of methods-to-override) on
> Repository is doable.
I agree. It was the notion of trying to make RepoFormat into Storage
format that I objected to.
>> Ideally, they would have never gone on the API in the first place. But
>> now that we're here, I don't see a way of removing them without
>> significant API breakage.
> deprecation + change the code that uses them within the library. I
> consider a deprecated method as 'gone' for cleanliness purposes.
I haven't considered it that way. I assumed it wasn't "gone" if I had
to scroll past it.
>>> We've basically got a number of 'Storage-like' objects already; and they
>>> are insufficient to handle all the variation between Repository disk
>>> format capabilities. This is why I'm sceptical about adding another
>>> Storage interface being particularly better.
>> In our MPRepo discussion, you didn't think it was tasteful to renew
>> Stores and give them the kind of responsibilities I'm now suggesting for
>> Storage. That's why I'm proposing Storage rather than trying to
>> rehabilitate Stores.
> For the same basic reason I think. I think Repository needs love, rather
> than going sideways to avoid the problem (because I don't think sideways
> avoids the problem).
I think it's desirable to preserve the separation of concerns that
Stores gave us. Repositories already do plenty, and they have always
mediated their access to the underlying storage through Stores or
Versionedfiles. I think adding new responsibilites for talking to
indices, requesting raw data from the underlying storage, and creating
fulltexts from faw data is going to make the cleanliness problem worse,
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----END PGP SIGNATURE-----
More information about the bazaar