[PATCH] remote_branch with asyncore

Robert Collins robertc at robertcollins.net
Tue Jun 28 01:16:27 BST 2005


On Mon, 2005-06-27 at 17:37 -0400, Aaron Bentley wrote:
...
> Now that I think about it, I don't think there are any interesting
> properties of ImmutableStore.  We should always invoke
> copy_multi_helper_parallel.

I'm glad to have helped by triggering thought ;).

> However, in the case where we're downloading from one RemoteBranch while
> uploading to another, we probably will want to support optimizing on a
> class-by-class basis.  Especially when you look at updating revfiles.  I
> think trying to introduce abstractions here would lead to extremely
> specific 'abstractions'.

Well, revfile indexs are stricly per-branch, same for weave-on-disk.
That is - there is no 'please do a bulk copy' that will degenerate into
bit-copying in the general case - and the specific case (the revfiles
are identical up to point X, so we can bit-copy the two) - would be
fairly well served by something like this..
assume we have a FileId object that resides in a Branch. To get a
specific revision we can ask for
file.getRevisionText(revision)
and RevFile offers this protocol, as does a WeaveFile, or a FileStoreId
or somesuch.

Then the obvious optimisable code path is something like:
if FileId.can_stream_update(source_file, target_file):
  Connection.stream_update_files (source_connection,
                                  source_file.data_files,
                                  target_connection)
else:
  missing_revisions = ([revision in source_file.revisions 
                       if not revision in target_file.revisions])
  for revision in missing_revisions:
    target_file.add_revision(revision)

where Connection is the class for Connections / Transports (I'm calling
it Connection because to me that implies a higher level facility. You
might for example have a RemoteShellConnection and ssh, rsh, telnet
Transports for it). 
stream_copy_files is something that streams up new bytes in a file from
source_connection to the same file in target_connection.
source_file.data_files is a list of all the files used by the FileId
object - i.e. for a file-store, its all the text copies, for a revfile
it is the index and the datafile etc. The above code looks clean to me,
and able to optimise all the way up to rsyncing the bulk of the files if
the two transports support that (thats why the stream_update is not a
method on either connection object but on the class). The
can_stream_update function -may- need to do a __class__ == __class__
test but I think I can (cleanly) avoid that - I need to check some
reference material here, its smelling of a familiar pattern.
Something like (this is wrong, I need to do that lookup)
def can_stream_update (klass, source, target):
    """Return true if source starts with a byte compatible
    representation of target
    """
    return source.storage == target.storage && source.startswith(target)

There is another less obvious optimisation than just ensuring byte
equality..., you could copy the pre-calculated deltas from a rev file
across, giving them new index entries as they land - as long as the top
of your specific graph is already present. I'm not sure that that is a
good optimisation though, as the tree balance could be quite different
in the target - IMO its likely better to always generate the best graph
for the target branch.

However, as I don't have a good feel for the whole code base, I'll not
argue the point further.

> > I also note there are no new tests - unit or otherwise - for this.. I
> > suspect that remote access is one area where tests are best put in
> > early ;).
> 
> It's an area that's rather tricky to unit test at all.  Suggestions welcome.

Thats why its so important to start early ;0.

MockObjects are probably a good bet for testing the client logic - build
a small set of code that pretends to be an http client and lets your
client logic think its achieving connections, errors etc - and you can
tell the right methods were called. If thats hard, the API we are using
is wrong ;).

We don't need to test the actual http client logic unless we take
ownership of it. (If its untested, thats a good reason to keep
looking ;)).

If we want to test the actual http code for whatever reason, its about 3
lines of python to bring up a test server on a high port and talk to it.
Another 4 or 5 to attach various objects to it - such as a filesystem
for putting against. That really falls in the functional test basket
though.

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050628/4e6dde7a/attachment.pgp 


More information about the bazaar mailing list