What is KnitPackStreamSource _for_

Wed Jul 1 16:40:38 BST 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

> For our formats, when fetch_order != 'topological' we have atomic
> inserts. We could make that an explicit attribute on the repo format
> too.
> 
>> 2) Similarly about fetching in topological order for Revisions (it
>> does
>> a topo_sort that we don't need.)
> 
> A topo sort is useful because it means adjacent revisions are adjacent
> on disk. So its not -needed- but actually there is a bug open somewhere
> about knitpack not ordering on disk properly.

Well, 'bzr pack' on a KnitPack repository sorts in *reverse topological*
order, so you end up doing backwards seeking to send it.

> 
>> 3) Because it has to handle all permutations, the code is *much*
>> harder
>> to write and read, and certainly make perform optimally. Having a
>> simple
>> "data X to data X" stream is much cleaner (IMO). For example, all of
>> the
>> code to handle converting between rich roots, or chk formats, or...
>>
>> I'm not sure how you want this documented. I'll be happy to flesh out
>> whatever you would like.
> 
> Well, I really want to get to one code path that we can test rigourously
> and be sure is complete. So I want to have precisely two sinks and
> sources:
>  * real
>  * remote proxy
> 
> Without getting to that point its a _lot_ harder to be confident about
> behaviour and robustness over the network.
> 
> If we are going to have subclasses, we should make *old* or *less
> capable* formats have the subclasses, not the default implementation.
> 
> -Rob

IMO, it just changes where the 'if' statement is.

You can either have one giant if statement in "get_stream()" or you can
have a different StreamSource. It was *much* easier to write a simple
stream source that only concerns itself with same-format conversions.
The code is (IMO) *much* cleaner, since you don't have all of the little
if clauses "am I converting to rich roots, am I fetching topological, am
I...". Being easier to write isn't a perfect metric, but if the code was
easier, it probably means it is less complex, which probably means it is
easier to understand, support and maintain.

I'm of the opinion that having an optimized "this is the same format"
and a less optimized but still decent "I'm going between formats" is a
reasonable place to be. It doesn't take much to implement test
parameterization that can test those cases. You permute against all
formats, and then against a few key cross-formats.

The code that is complex is saved in one location, and the bulk of your
transfers get to use nice, clean, simple and optimized code paths.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpLg3YACgkQJdeBCYSNAAN1cQCfZv3Mg6TMOoI+vP0uoNi57C3W
fLAAnjwxl9A5AQIZAq/5yG3B2O6rio8X
=AjRv
-----END PGP SIGNATURE-----