plugin: sftp transport

Thu Oct 20 04:08:54 BST 2005

Martin Pool wrote:
> On 20/10/05, John A Meinel <john at arbash-meinel.com> wrote:
>
>>Martin Pool wrote:
>>
>>>On 20/10/05, John A Meinel <john at arbash-meinel.com> wrote:
>>>
>
>
>>That was the original idea about Transport.*_multi (at least the
>>performance concerns. It doesn't really handle the event based GUI very
>>well)
>>
>>But I did think of a use case which would be nicer with async. If you
>>are branching or pulling from a remote tree, you can create a queue of
>>files which need to be downloaded. You start with revision-history, once
>>that comes in, you grab the last revision, and request that, along with
>>that entries inventory.
>>When the inventory comes in, you can add the texts you don't have to the
>>queue.
>>When the revision comes in, you can request it's ancestors, along with
>>their inventories.
>>This at least seems like a fancy async system, where you keep making
>>requests for new stuff as the information comes in.
>
>
> I think you could build this on top of something like the http clients
> you and i did, but perhaps you'd just end up reinventing Twisted.

My point was that the Transport.*_multi() functions are designed such
that "I know of 50 files (chunks) I want, give them to me". Which grabs
them, and returns (it generally yields as things come in, but that is
not guaranteed).

And then you do stuff with the returned values, and then make another
request of hopefully a bunch of files.
That isn't quite the same thing as an async function which keeps
building up the queue.

>
>
>>On the other hand....
>>I guess if you just requested the entire inventory.weave you can get
>>most of this from local operations. Since it has all inventories, and
>>the ancestry information, etc (you still need to make a request for all
>>the revisions, but that is easily a get_multi).
>>
>>However, .bzr/inventory.weave is now 1.5MB, and can't be broken up
>>(yet). So if you are doing a bzr pull of 1 revision, it has to download
>>at least 1.5MB.
>
>
>>In the future, when we have an indexed append only weave format, in
>>theory you could only download the portions of the inventory.weave that
>>you don't already have. This should be contained in the index file, so
>>really you probably could make some pretty efficient requests just with
>>get('inventory.weave.idx'), and then get_partial_multi() all of the
>>chunks that you don't have in the local weave, and a get_multi() all of
>>the revision store entries you don't have.
>
>
> I'm working on that now.  I think it'll be possible.
>
> We could use a similar approach to hold all the revisions in a single
> file, and then after reading the index use a single request to fetch
> all the revisions we don't have yet.

Sure. And that could easily be an indexed weave. You might look at my
revstore2weave plugin, along with my SAX work to make revisions somewhat
weave compressible. The plugin is just a way to play with converting all
revisions into a weave, I'm not saying this should be any sort of final
code. It is simple enough that probably any final update code would be
written from scratch.

>
>
>>>Or maybe just hang on to the indefinitely, or until the connection drops.
>>
>>My concern here is front-ends that are using bzrlib. For instance,
>>Emacs, or some other gui. The common thing to do is to connect for a
>>single operation, and then disconnect. (status, diff, merge, etc). And I
>>believe your gui is going to be running for hours, versus bzr the cli
>>running for seconds (in a perfect world <1s).
>
>
> I don't see any problem in holding an ssh connection open for hours.
> I do it all the time.  (Maybe I missed your point?)

It isn't a problem at the protocol level. It is a problem at the
expectation level. At least from my point of view, for stuff that is run
using ssh as a transport (not a shell), it is much more common to
connect, do what you are doing right now, and then disconnect.

I don't know of any of the standard programs (cerevisia, subversion,
WinCVS, etc) which stay connected. They usually connect once per status,
diff, etc. Though they are mostly designed to spawn a backend process
which requires that.

It feels odd to have a non-terminal application stay connected when it
doesn't need the connection. Especially since it might be left open for
even days at a time.
Perhaps we could expose it slightly, so that a front-end could call
"bzrlib.keep_connections_alive(True)", or maybe with a timeout.

John
=:->

>
> --
> Martin
>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051019/84b78b73/attachment.pgp