Suggested changed to the transport API.

John A Meinel john at arbash-meinel.com
Sat Dec 3 23:36:41 GMT 2005


Robert Collins wrote:
> On Sat, 2005-12-03 at 12:01 -0600, John A Meinel wrote:
> 
>> HTTP objects are not seekable. If you want to only get a portion of the
>> file, then you can use "get_partial".
> 
> That api was a kludge: 
> 
>   [ For readers following along at home, get_partial returned an file
> like object that started at an offset and went forward from there. ]
> 
> It had essentially the same result for a user as 'open the file; seek to
> the offset; start reading', but it made Transport responsible for
> seeking, and you could only seek once: you had to open the file again
> later to get more data, and there is no possibility of (for instance)
> building up a memory map of the regions of the file that *have* been
> read.

Actually, the real idea of get_partial was that when combined into
get_partial_multi you would have a gather read.
The problem with seek + read, is that they are separate commands. So you
don't know if when I read(10) if I'm only going to want 10 bytes before
I seek again, or if I'm going to follow it with another read(10).

I think we still want something more than just seek + read. But it
certainly could be implemented in a different way.
get_partial_multi() was really where the magic was supposed to be.
Though I know you didn't really like the *_multi() api anyway. I did
show there was a large improvement which could be gained by get_multi()
of the revision files on an initial branch. But that doesn't mean it
would be advantageous everywhere.

> 
>> By switching to get_partial, it means that not only can you get the
>> added bandwidth improvements for sftp and local, but you also can get it
>> for http (I believe it doesn't work for ftp).
> 
> FTP has 'REST' for most FTP servers these days, certainly an
> implementation could try it. You dont need the kludgy get_partial api to
> get the bandwidth improvements though - having files that can seek with
> a dumb backend protocol is completely feasible.

I think you have extra round trips if you make things be completely
lazy, and only pull whatever is specifically requested.
If I'm going to be wanting the entire file, it is much better to have it
request everything, rather than just the bytes as I go.

I suppose you could use the heuristic of read with no length, versus
reading a set of bytes.

> 
>> If you switch to expecting to have access to the entire file, and then
>> seek around it, then for HTTP you have to download the entire thing into
>> either a local temporary file, or a StringIO.
> 
> That would surely be only a first-slice implementation of seeking with
> HTTP. Once you have one data block from HTTP you can request that the
> response be for the same instance using the content md5, and that lets
> you issue range requests as many times as needed.
> 
>> I can certainly see some benefits of switching from a get/put model to
>> an open/seek/etc model. But it just isn't supported well under http.

I guess I'm trying to get across that a gather api would be better than
a plain file with seek + read.

Overall, I'm fine with switching the Transport api to returning smart
file objects, which do lazy reading/writing etc.
I think there were some decent atomic properties with the current api.
Such that put() was always an atomic action which copied all of the
bytes, or none at all. I'm not sure how that would work with a lazy
object. It could not install the file until close, I guess.

John
=:->

> 
> ???
> 
> Rob


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051203/23829c3c/attachment.pgp 


More information about the bazaar mailing list