[Request] Remote operation need to be cached

Tue Oct 4 09:55:08 BST 2005

On Tue, Oct 04, 2005 at 09:39:44AM +0200, Jan Hudec wrote:
> On Tue, Oct 04, 2005 at 03:25:26 -0400, James Blackwell wrote:
> > > Alexander Belchenko wrote:
> > > >I'm trying to use bzr pull because rsyncing for this BIG repository is 
> > > >very long operation.
> >   
> > On Mon, Oct 03, 2005 at 09:15:54AM -0500, John A Meinel wrote:
> > > rsyncing is going to be *way* faster than bzr pull. Usually 5-10x. Like 
> > > rsyncing the bzr.dev tree takes < 3 min, but bzr pull it takes about 
> > > 17min for me.
> > > 
> > > There is some discussion about CentralizedStorage which would be sort of 
> > > a local cache.
> > 
> > The last time I heard this issue seriously discussed, centralized storage
> > was practically a given. Other than Doing It, the thing standing in the
> > way is defining clearly how to prune out less valuable data from the
> > cache.
> 
> AIUI centrailized storage is not a cache nor anything that even
> remotedly resembles one. It also can't prune less valuable data, because
> it can't know which data that would be.

The understanding I came away with was that these two things are so
similiar that they may as well be treated as the same thing.  Both of them
are a collection of a pile of changes. 

In all fairness, they aren't identical. I can see these differences
between a "caching patch pool"(CPP) and a "centralized storage"(CS).

 * In CS, the patches are valuable and should be retained. In CPP they are
   disposable. 

 * In CS, the patches are intentionally there. In CPP, they happen by
   happenstance

 * Related to the above, but CS will be less entropic than CPP. 

 * In CPP one has an expectation of automatic pruning. In CS automatic
   pruning gets developers pruned ("Think you can delete my code? Well,
   take THAT").

 * CPP is generally going to be closer than CS, which in turn will be
   closer than remote.

That said, the similiarities between CS and CPP are too strong to ignore.
 * Both are a collection patches for branches.

 * Both serve as caches of a sort. Both will likely have similiar, if not
   identical, storage formats.

 * Both are likely (CPP certain, CS likely to frequently depending upon
   implementation) to effectively be caches of more authoritive data
   elsewhere.

 * CS can be implemented within a CPP framework by setting no expiry.

> Actually, it could know, using some smart tricks with hardlinks, but it
> would only work if all working copies using that centrailized storage
> are on the same partition and that partition supports hardlinks.

> > > I'm not sure how it would work with the new weaves, though.
> > 
> > Excellent question.
> 
> Why it shouldn't? It should work just as the in-tree storage in .bzr,
> except it would allow multiple heads (revisions with no descendants).

My thought on this is that a file-storage mechanism is well suited towards
breaking apart previous "revisions". In a weave, these revisions are put
together in a way in which its more difficult to tease apart. I suspect
that in practice the amount of effort to tease a revision out of a weave
will be higher than just storing the entire branch. 

Not too long ago I discussed at length the idea of having conflating
caches (The idea would _certainly_ work for CPP, probably not for CS)
perform conflation of older data. I was able to get the idea across to
some.  I wasn't able to convince everyone though. I think the failure here
wasn't in the idea itself, but in the method of describing. 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051004/ac4171cb/attachment.pgp