On remote weaves

John A Meinel john at arbash-meinel.com
Tue Aug 2 17:55:57 BST 2005


Aaron Bentley wrote:
> John A Meinel wrote:
>
>>>We probably need to work out some of the interfaces to a Storage
>>>location, since you seem very keen on using Weave for merging, we
>>>probably need a get_weave() request for store.
>
>
> I think this adds an unnecessary requirement, since all of the data in a
> weave can also be produced from flat stores.
>
> So one approach is to copy all the relevent revisions from the remote
> branch to the local branch, and then do the weave using local data.  So
> you don't have to worry about the RemoteStore interface-- only
> Branch.update_revisions needs to.  And you don't need to convert data,
> because we'll assume your local storage is already a weave.
>
> Since we don't need a lot of weave data for a merge, just the history
> since any last common ancestor, I'm not convinced that we need local
> weave storage at all-- we can just generate them at need, and the cost
> is proportional to the number of commits to that file since the last merge.
>

I would prefer not to see weave storage, since I don't like storage
mechanisms which modify in place.

I thought building a weave was sufficiently costly, that it was worth
caching, but if you can show that it isn't, I have no problem with that.

>
>>>The other possibility was to make a SmartBranch, which started to
>>>override more of the Branch operations. Aaron was thinking that it could
>>>serve up the .text_store and .inventory_store, etc on it's own (not a
>>>separate Storage + Transport class). He feels that they are part of the
>>>public interface of Branch, and thus needs to be preserved.
>
>
> The reason I say they're part of the public interface is because their
> names don't begin with '_'.  That makes them public variables, and part
> of the interface.
>

You're right, it does in practice, but that doesn't mean that the lack
of an '_' is an explicit design, or just a mistake.

>
>>>>From my experience, though, the *_store members are more of an
>>>implementation detail. Everyone else should be going through the
>>>get_revision() type interfaces.
>
>
> The problem with that is commands like update_revisions, which operate
> on another branch's *store members.
>

Except that is done *within* branch. Generally, when a class has private
members, another object of the same class can access them, because it
should know what is going on, and how to maintain consistency.

It is different for one Branch to access another Branch's private
variables, versus having a Command do so.

Perhaps it is worth the optimization to have a Branch access a
SmartBranch's storage directly, but potentially SmartBranch is involved
enough for that not to be the case.

>
>>>(Otherwise they just get files, rather
>>>than getting Revision objects).
>
>
> Actually, they get the texts assigned to those ids.  The underlying
> files may be different.

Sure, they get some form of the flat text, rather than getting a python
object.

>
>
>>>There might be some places that go directly to the store, but I feel
>>>those probably should just be cleaned up.
>
>
> It's conceivable, but to me it looks like there's not a lot of return
> for the effort.
>
>
>>>A SmartBranch would take quite a bit more to implement. It's probably
>>>worth it, but I think we could have nice remote operations working over
>>>SmartTransport a lot sooner.
>
>
> I don't see that.  xml-rpc looks quite easy to layer onto existing objects.
>
> Like this: (from
> http://www.onlamp.com/pub/a/python/2001/01/17/xmlrpcserver.html)
>     def call(self, method, params):
> 	print "Dispatching: ", method, params
> 	try:
> 		server_method = getattr(self, method)
> 	except:
> 		raise AttributeError, \
> 		    "Server does not have XML-RPC " \
> 		    "procedure %s" % method
>     return server_method(method, params)

It is, but that doesn't make an optimal communication protocol. For
instance, when requesting an object, it would frequently be good to
alert them to what objects you already know about, so that it can send
smaller chunks.

For instance, I could tell the server that I need the inventory for
revision X, but I have the inventory for it's parent. The server could
then just send a diff of X versus it's parent, rather than sending the
full text. For large inventories, this would probably be quite a bit of
bandwidth savings.
Likewise for the text-store. Especially when we go to tie in the
revision-id into the text-id, you can request the delta, and then
rebuild the new text on the local side.

Also, when requesting a series of entries, you could have them sent
delta after delta.

>
> I don't see what's lacking in the sftp protocol.  Locking and pipelining
> are both explicitly supported, as well as lstat.  It's just that by
> using the commandline sftp client you limit yourself to what it
> supports, and you aren't able to take full advantage of the protocol.
>
> I can understand that it would be fun to implement your own remote
> filesystem protocol, but unless there are capabilities SFTP lacks, I
> think it makes more sense to use what's already out there.

I have no problem with a more advanced use of the SFTP protocol. I
haven't ever really seen anywhere where it was fully defined. If it has
explicit locking, pipelining, appending and lstat, it seems to provide
everything we need. Is there an official SFTP definition, or is it just
whatever happens to be implemented by the OpenSSH people? (I realize
some portion of it has to be official, but are locking, etc official, or
just common extensions?)

I fully admit my implementation is limited in it's support, just that it
exists, and I didn't have the paramiko library to play with.

I would be happier to have a fully functional over plain SFTP rather
than a smart server, because I like not having to run a different daemon
on the remote machine. I'm very fond of arch for having dumbfs support,
and being able to do everything with just sftp would be sufficient for me.

I suppose some would like regular ftp support, but I don't know if the
ftp protocol supports everything that we would need.

>
> Aaron

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050802/6558192c/attachment.pgp 


More information about the bazaar mailing list