[MERGE] Integration ordering

Tue Jun 19 04:48:16 BST 2007

On Mon, 2007-06-18 at 23:29 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> > + * Deprecating versioned files as a supported API: This collaborates with the
> > +   Repository API but can probably be done by adding a replacement API for
> > +   places where the versioned-file api is used. We may well want to keep a
> > +   concept of 'a file over time' or 'inventories over time', so the existing
> > +   repository model of exposing versioned file objects may be ok; what we need
> > +   to ensure we do is remove the places in the code base where you create or
> > +   remove or otherwise describe manipulation of the storage by knit rather than
> > +   talking at the level of file ids and revision ids.
> 
> This is some nuance I hadn't heard before.  Makes sense to a degree, but
> whatever the preferred order for insertion is, we'll probably need
> interfaces that make batch insertion easy.

The streaming data interface I proposed on list a we bit back is I think
the core such interface. But it will be hard to know until we get there.
Certainly we don't want to pessimise today for a supposed win tomorrow.

> > The current
> > +   versioned-file API would be a burden for implementors of a blob based
> > +   repository format, so the removal of callers, and deprecation of those parts
> > +   of the API should be done before creating a blob based repository format.
> 
> I think I should point out that these changes could make knit-based
> repositories slower.  Hopefully there won't be fallout, but we should be
> aware of the possibility.

Right. The way I hope to counter that risk is to have us do the core use
case refactorings *now*, pushing the abstraction burden further down
until its inside Branch or Repository and not exposed outside that. I
think this can be done without performance compromises.

> > + * Working tree disk ordering: Knowing the expected order for disk operations
> > +   may influence the needed use case specific APIs, so having a solid
> > +   understanding of what is optimal - and why - and whether it is pessimal on
> > +   non linux platforms is rather important.
> 
> Well, apart from the create-in-limbo-parent change, I haven't found any
> particular order of operations faster than any other.  And that's
> without repository overhead.

Thats good to know. I think I marked wt disk ordering as done in the
graph.

> > + * Be able to version files greater than memory in size: This cannot be
> > +   achieved until all parts of the library which deal with user files are able
> > +   to provide access to files larger than memory. Many strategies can be
> > +   considered for this - such as temporary files on disk, memory mapping etc.
> > +   We should have enough of a design laid out that developers of repository and
> > +   tree logic are able to start exposing apis, and considering requirements
> > +   related to them, to let this happen.
> 
> I really really hate this change.  It'll make development so much harder.

Ok. I'm not particularly pushing for it myself - because I don't version
big files. However I think it is needed to really consider ourselves
mature. What can we do to reduce the impact on development? 

> > + * New container format: Its hard to tell what the right way to
> structure the
> > +   layering is. Probably having smooth layering down to the point
> that code
> > +   wants to operate on the containers directly will make this more
> clear.
> 
> Certainly I would like a higher-level API.  At minimum, I want to
> store
> "compression type" and "parent list" of 99% of my records.
> 
> And TBH, it wouldn't suck if names could be generated from
> revision-id,
> file-id and a string.

I can see that. I'm really saying that I'm not sure of several layering
issues such as subclass vs decorate, and 'one fat API or three' yet; I'd
like us to get more candidate users of the higher level API's available
to triangulate effectively.

> > +   bundles will become a read-only branch & repository, the smart
> server wants
> > +   streaming-containers, and we are planning a pack based
> repository, it
> > +   appears that we will have three different direct container
> users. However,
> > +   the bundle user may in fact be fake - because it really is a
> repository.
> 
> And indeed, the smart server may also wind up streaming
> repositories/bundles.  (Though I suppose your point was that it could
> stream other data as well.)

Right.

> > + * Separation of annotation cache: Making the disk changes to achieve this
> > +   depends on the new API being created. Bundles probably want to be
> > +   annotation-free, so they are a form of implementation of this and will need
> > +   the on-demand annotation facility.
> 
> OTOH, bundles may contain multiparent deltas, and those can be used for
> annotation quite efficiently.

This is true. It may be that we end up using multiparent deltas at the
core of the system and discard the xdelta concept. Do we have enough
data to do a shoot out yet (including serial-IO performance benefits).

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070619/eadce278/attachment.pgp