[MERGE][#178353] Smart request to stream revisions without buffering

Fri Jan 11 03:38:24 GMT 2008

John Arbash Meinel wrote:
> John Arbash Meinel has voted tweak.
> Status is now: Conditionally approved
> Comment:
> 1) This seems like it can only break up data into "chunks" with a knit-hunk 
> granularity. So if you add a 100MB file, then it will send a 100MB chunk. 
> Conversely, if you have 100 2-byte files, it will send 100 "chunks". I'm 
> okay with it, but it does seem like separating the chunking layer from the 
> data your are transmitting would let you do something like "always 1MB 
> chunks".

This is true, but not really any worse than the rest of bzrlib, which still
holds entire knit hunks in memory when reading or writing them.  So there's not
much benefit to fixing this.  Perhaps we should revisit this when we rework the
network format for transferring revisions to be more efficient for packs.

So I think it's ok as is.

> 2) I still don't see anything about handling a Knit stream put into a Pack 
> repository, etc. (Or maybe more explicitly requesting the kind of stream 
> you want.)

Yeah.  Like the previous method, this fetch method only triggers when both ends
are in the same format.  So it provides an optimisation in certain
circumstances, but falls back to something slower (and correct) for everything
else.  We ought to fix this to be used in more situations, but I don't think
that needs to be fixed at the same time as removing the buffering.  

>
> 3)

I guess this one isn't so important ;)

> === modified file 'doc/developers/index.txt'
> --- doc/developers/index.txt    2007-12-05 18:58:49 +0000
> +++ doc/developers/index.txt    2008-01-02 03:08:59 +0000
> @@ -34,7 +34,6 @@
>  * `Network protocol <network-protocol.html>`_ |--| Custom network 
> protocol.
>
>  Data formats
> -============
>
>  * `Knit pack repositories <packrepo.html>`_ |--| KnitPack repositories
>    (new in Bazaar 0.92).
>
> ^- This doesn't seem like a valid change. I realize there is only one 
> entry, but it seems like they should be in their own (sub)section.

Fixed.  That must have been an accident.

> 4)
> +                try:
> +                    # These records should have only one name, and that 
> name
> +                    # should be a one-element tuple.
> +                    [name_tuple] = record_names
> +                except ValueError:
> +                    raise errors.SmartProtocolError(
> +                        'Repository data stream had invalid record name 
> %r'
> +                        % (record_names,))
>
> Odd construct, why not just do:
>
> if len(record_names) != 1:
>   raise ...
>
> Using [foo] = bar as you have is... odd. At least you could use a tuple 
> instead with (foo,) = bar or for extra points "foo, = bar".  (No I'm not 
> advocating it, but showing why I think it is a poor way to do it.)

The thinking was:

  * I need to unpack it anyway, because I yield that value on the next line
    after your quote
  * Tuple unpacking for one element is too easily confusable with other
    constructs (“(foo) = bar” doesn't work, and the options with commas work,
    but someone could easily not realise the comma is necessary.)
  * List unpacking works fine, and the syntax for a single element is sane.

Also, the thing I don't like about:

    if len(record_names) != 1:
        raise ...
    name_tuple = record_names[0]

is that there's a bit of repeated information, and it's not wholly obvious at a
glance that the “1” and the “0” should be kept in sync.  So there's scope for a
change to allow an inadvertant error.  Whereas the unpacking in a try/except
states in exactly one place that the sequence should have one element.  (And
there's also a tiny bit of distaste at the repeated work.)

To be honest I don't think either your way or mine is great, but I don't think
it matters hugely either.  So I'll do it your way, but I thought I should
explain my preference.  For cases where there's more than one element being
unpacked I think the try/except form is the clear winner, though.

> Otherwise I'm happy to see that it wasn't terrible to get a genuinely 
> streaming send. The buffering was pretty bad on large repositories. I am 
> curious about the overhead of the chunk headers, but they generally seem 
> small. (A hex length and a newline is log16(data)+1 extra bytes.)

Yeah, I'm not too worried about this.  Latency is a much much larger problem
than bandwidth for us atm anyway.

-Andrew.