Suggested changed to the transport API.

Robert Collins robertc at robertcollins.net
Sun Dec 4 00:42:04 GMT 2005


On Sat, 2005-12-03 at 17:36 -0600, John A Meinel wrote:

> > 
> >> By switching to get_partial, it means that not only can you get the
> >> added bandwidth improvements for sftp and local, but you also can get it
> >> for http (I believe it doesn't work for ftp).
> > 
> > FTP has 'REST' for most FTP servers these days, certainly an
> > implementation could try it. You dont need the kludgy get_partial api to
> > get the bandwidth improvements though - having files that can seek with
> > a dumb backend protocol is completely feasible.
> 
> I think you have extra round trips if you make things be completely
> lazy, and only pull whatever is specifically requested.
> If I'm going to be wanting the entire file, it is much better to have it
> request everything, rather than just the bytes as I go.
> 
> I suppose you could use the heuristic of read with no length, versus
> reading a set of bytes.

As a for instance:
f = t.get(foo)
f.read(400) -> start a full retrieval and return after 400 bytes are
available.
f.seek(offset, whence) -> update our local offset pointer
read(400) -> cancel the current read unless our offset is compatible
with it (with a fudge factor), then start a read from there.

> > 
> >> If you switch to expecting to have access to the entire file, and then
> >> seek around it, then for HTTP you have to download the entire thing into
> >> either a local temporary file, or a StringIO.
> > 
> > That would surely be only a first-slice implementation of seeking with
> > HTTP. Once you have one data block from HTTP you can request that the
> > response be for the same instance using the content md5, and that lets
> > you issue range requests as many times as needed.
> > 
> >> I can certainly see some benefits of switching from a get/put model to
> >> an open/seek/etc model. But it just isn't supported well under http.
> 
> I guess I'm trying to get across that a gather api would be better than
> a plain file with seek + read.
>
> Overall, I'm fine with switching the Transport api to returning smart
> file objects, which do lazy reading/writing etc.
> I think there were some decent atomic properties with the current api.
> Such that put() was always an atomic action which copied all of the
> bytes, or none at all. I'm not sure how that would work with a lazy
> object. It could not install the file until close, I guess.

f = transport.get(foo)
blocks = f.read_blocks([(0,400), (600, 20), (-400, 400)]

Thats an example of a synchronous gather api based on the file. The
point is that operations about a file should be on the file. For prior
art see readv, writev, and the OverlappedIO functions in WIN32.

For atomicity there are two things: one is the atomicity of uploading a
new file, and one is the atomicity of concurrent writers to an existing
file.

For the former, I suggest that that is not transports problem: we put to
a file 'foo' and then we do a rename to a file 'bar'. Thats something
AtomicFile can do for us and is fully atomic.

For the latter, well thats a nasty and interesting problem ;-) one I
think that locking the repository is the best solution for.

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051204/4919b543/attachment.pgp 


More information about the bazaar mailing list