Suggested changed to the transport API.
John A Meinel
john at arbash-meinel.com
Sat Dec 3 18:01:50 GMT 2005
Johan Rydberg wrote:
> Hi,
>
> I would like to give the Transport API a needed overhaul. As I see it
> there are two major changes needed;
>
> (1) get rid of the file-or-string arguments to the methods, replace them
> with only file-like objects, and
>
> (2) let methods such as "put" and "append" return file-like objects to
> which data can be written.
>
> My reason to want to do (1) is that it is just plain ugly.
I'm not sure that it is just plain ugly. I would be okay with updating
it, though. There are a few places which want to write a string, and I
think the old "put_controlfile" interface supported putting just a
string, so I just continued that.
>
> The main reason for (2), to switch from a "pull" to a "push" approach,
> is to make it easier for history formats such as knits and history
> deltas. If you are gonna build a file and store it using a transport
> you normally have to create a StringIO-object and invoke "put". This
> means that the whole file is kept in memory until written. Not good.
I understand your idea here, and I had thought about it. If I was going
to do it, though, I would change it such that "get" was switched to
"open" and you could open a remote file for reading or writing.
The main reason that I didn't prefer it was because Transport was meant
to be a *transport*. I have this blob, put it over there, now bring this
one over so I can work on it.
It wasn't meant to be seekable open files, etc.
We can change it to do that, but it won't work with all protocols.
>
> The current methods with current signatures can be renamed to
> something like "store" and "store_multi" if the need is there.
>
> Current usage:
>
> fp = StringIO()
> fp.write(...)
> print >>fp, 'data'
> fp.seek(0, 0)
> transport.put(path, fp)
>
> Would change to:
>
> fp = transport.put(path)
> fp.write(...)
> print >>fp, 'data'
> fp.close()
>
> The explicit close is needed to signal that we are done with the file
> and the actual contents can be stored. Some transports, like ftp,
> might need to keep a the contents in a temporary file until close() is
> invoked, but I see no problem with that.
>
> Another issue with the current API is that there is no way of knowing
> that the file-like object that "get" returns is seekable. This is
> important for both knits and history deltas.
>
> I'll write a test that verified that the objects returned by "get" has
> a seek-method, and propose we add it to bzrlib.tests, just to keep track
> that transports returns sane objects.
HTTP objects are not seekable. If you want to only get a portion of the
file, then you can use "get_partial".
By switching to get_partial, it means that not only can you get the
added bandwidth improvements for sftp and local, but you also can get it
for http (I believe it doesn't work for ftp).
If you switch to expecting to have access to the entire file, and then
seek around it, then for HTTP you have to download the entire thing into
either a local temporary file, or a StringIO.
One of our big features is that we support readonly access over plain
http, and I would hate to have that be extra slow, because we switch the
transport api.
If you are concerned about loading whole files into memory, we can
always use temporary files instead of StringIO.
Frequently what happens is you just get from somewhere, and put it
somewhere else, so you don't even care what is inside the file. (Up
until recently osutils.pumpfile did out.write(in.read()), so it read
everything anyway).
I can certainly see some benefits of switching from a get/put model to
an open/seek/etc model. But it just isn't supported well under http.
John
=:->
>
> ~j
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051203/7dabc85d/attachment.pgp
More information about the bazaar
mailing list