[MERGE] Add simple Storage object

Mon Feb 11 05:08:11 GMT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> One interesting question is whether we want to get an all-in-one
> facility, or keep the separation between revision, inventory, files,
> signatures. Keeping that separation obviously implies a 4 round trip
> minimum for many operations, but OTOH they go have quite different index
> and storage needs.

I think it's also worth pointing out that differences in storage and
index needs don't necessarily mean we need to expose multiple objects in
order to support this.

For example, in our knit repositories, only file texts are annotated.
Inventories and file texts have delta compression, while revisions and
signatures do not.  Yet I was proposing that Storage unify these into an
all-in-one facility, without changing those characteristics.

I think it makes a lot of sense for an all-in-one storage object to
decide the optimum way of storing items, according to their namespace.

> I think the current stores _force_ a separation between all the things
> we serialise and make it trickier to add new things; the separation of
> concerns we have there is also flawed at the moment - we conflate
> physical storage with serialisation with model.

I agree completely.  My model for Storage was that the Storage object
would handle the physical storage, the Builder object would handle the
raw-component <-> fulltext conversion, and Repository would handle
fulltext <-> object conversion.

> Without trying to predict what the best outcome will be, I think what we
> need to do is to alter the current internal layering, so that Pack
> repositories have the least possible friction, and put in thunk layers
> in place for all the other repositories.

Full ACK.

> I propose an interface UnifiedByteStore. This is responsible for:
> storing, indexing and retrieving byte sequences with names that are a
> key tuple like ('text', fileid, revisionid), or ('revision', revisionid)
> or ('signature', revisionid) or ('inventory', revisionid).
> This /is/ RepositoryPackCollection on the Pack repository format, with
> perhaps a couple of tweaks.

This is certainly the core of Storage as I conceive it.  But I have to
say that, for conversion purposes, the ('text', ('file-id',
'revisionid')) form works better.  It means you don't have to pay
attention to whether or not there was supposed to be a file-id after
splitting the name up.

> add_bytes(key, value) -> hash
> get_bytes(keys) -> iterator of byte_sequences
> add_stream(key, object_with_read_close_methods)
> get_streams(keys) -> iterator of objects_with_read_close_methods

In many cases, objects with read and close methods are unneeded
overhead; for example, the Pack API does not provide such objects.

> ---
> in particular I expect the versionedfile.add_lines keyword parameters
> will be desirable for performance, but there is a good chance we can
> avoid pushing them down this far. Time will tell.

I would be surprised if we could get away without suggesting compression
parents at least.

> Graph queries for Pack repositories can be done via private methods on
> the unified store, likewise for Knits etc. I suggest this because graph
> relationships between keys is not appropriate for a bytestore

Why is not appropriate?  It seems to have worked out quite well for
VersionedFile.

I think it would be nice if repos only talked to the UnifiedByteStore
via its public interface.  Since UnifiedByteStore will only be used by
repositories, what's the point of a private interface?

I'd be happy with a UnifiedIndex if that seems like a more reasonable
place to put graph queries.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHr9g70F+nu1YWqI0RAjf1AJ0WANGRYYG/76X8qX+0ZZvwIYCDrgCfTDRI
6Vl4uO9NG4NZnBwa2QZlctw=
=EZwA
-----END PGP SIGNATURE-----