[MERGE] fetch to --dev6-rr should not use deltas

Robert Collins robert.collins at canonical.com
Wed Apr 8 23:32:14 BST 2009


On Wed, 2009-04-08 at 17:27 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> ...
> > 
> > Ok. The comment was pretty much undecipherable, which is why I changed
> > the line and noted that gc doesn't use it. There are tests that test it
> > is set to True. And setting it to False will cause a lot more network
> > traffic on large files than we really need.
> > 
> > We still need to do skinny deltas in groupcompress, and when we do this
> > will become more relevant still. The behaviour we want then, is:
> >  - deltas are only sent when the fulltext was sent earlier.
> > 
> > And really has nothing to do with using deltas per-se.
> > 
> > We can set topological here. Or we can even set groupcompress ordering.
> > 
> 
> Or we can just tell InterDifferingSerializer to *always* use topological
> ordering. I'm not completely convinced that always adapting in the
> target is the best answer.

IDS won't trigger over the network; I think using IDS for this is a bug,
and want us to be working to eliminate it.

> Consider that when extracting from pack-0.92 and knits, they already
> have a lot of code to do efficient extraction of multple fulltexts from
> a given KnitDeltaClosure. Such that even memory is saved by sharing the
> lines between different texts (at least until they are joined together
> to create a single fulltext string).

Sure, but it will still grab all of them, as I noted.

> 
> > I suggest that what we should do is:
> >  - set groupcompress ordering
> >  - set use_deltas to True (which means that we will be sent deltas the 
> >    source believes we can use).
> >  - change the rule for groupcompress ordering to include the
> >    requirement that deltas come after their basis always.
> > 
> 
> We can do that, though it is going to cost a lot more during the
> 'sort_groupcompress' stage. As a start, the groupcompress sort works
> purely on the parent_map, and doesn't know about the build chain at that
> time. We can teach it, I'm mostly just mentioning that the scope of this
> change is not a simple "change the sort algorithm".

Right, I know, which is why I suggest we don't do this for 1.14.

> 
> > The net effect of this will be:
> >  - fetching from pack repositories will be somewhat toplogical - we'll
> >    get long runs (delta length) in topological order, but these runs 
> >    themselves will be grouped by fileid and triggered by traversing
> >    the revision ids in reverse topological order.
> >  - fetching from groupcompress repositories will be in optimal
> >    groupcompress order
> > 
> > We can set use_deltas to False for 1.14, but we shouldn't leave it that
> > way in bzr.dev.
> > 
> > -Rob
> 
> My other big concern for groupcompress ordering is that it isn't
> 'stable', as it depends on what order things are yielded from a dict(),
> which will depend on how big the dict is, what python version is used,
> etc. Which means that just because the source is in groupcompress order,
> doesn't mean that it will fetch in purely optimal order.
> 
> The nice thing about using 'unordered' is that after running 'bzr pack',
> or even just an auto-pack, you are guaranteed to get the results on a
> group-by-group basis, which should be fairly optimal for both sides.

I think it would be relatively easy to make it stable; using a list
rather than a dict to hold things, and sort() each group of things we
have to add to the queue.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090409/e3be218a/attachment-0001.pgp 


More information about the bazaar mailing list