Suggested changed to the transport API.

Sat Dec 3 19:39:24 GMT 2005

John A Meinel <john at arbash-meinel.com> writes:

> I understand your idea here, and I had thought about it. If I was going
> to do it, though, I would change it such that "get" was switched to
> "open" and you could open a remote file for reading or writing.
> The main reason that I didn't prefer it was because Transport was meant
> to be a *transport*. I have this blob, put it over there, now bring this
> one over so I can work on it.

That was my initial plan, and I still liked it.  But Robert said he
liked to have a distinction between read and write handles.

> HTTP objects are not seekable. If you want to only get a portion of the
> file, then you can use "get_partial".
>
> By switching to get_partial, it means that not only can you get the
> added bandwidth improvements for sftp and local, but you also can get it
> for http (I believe it doesn't work for ftp).
>
> If you switch to expecting to have access to the entire file, and then
> seek around it, then for HTTP you have to download the entire thing into
> either a local temporary file, or a StringIO.

Yes, the http protocol is a problem.  My solution was to simply return
a HttpFile object that implements "seek" by simply keeping track of
the cursor, and then issuing get_partial-like requests when the user
invokes "read".

> One of our big features is that we support readonly access over plain
> http, and I would hate to have that be extra slow, because we switch the
> transport api.

I totally agree. I just don't think that would be the case.

> If you are concerned about loading whole files into memory, we can
> always use temporary files instead of StringIO.
> Frequently what happens is you just get from somewhere, and put it
> somewhere else, so you don't even care what is inside the file. (Up
> until recently osutils.pumpfile did out.write(in.read()), so it read
> everything anyway).

I rather see a better transport API than restoring to using temporary
files.

> I can certainly see some benefits of switching from a get/put model to
> an open/seek/etc model. But it just isn't supported well under http.

There is nothing that says that the file has to be fetched on the call
of "open".  Instead fetching can be done lazy.  As I see it there will
be two standard user-cases;

 1)  open, and then read the whole contents.
 2)  open, seek, and read a portion of the file. (for knits, hdeltas)

(1) can be simply implemented, and so can (2) if the actual fetching
of the data is deferred until the invocation of read.  

Thanks for you comments.  Have a nice evening.

~j