Transport w/ delta / offset

Aaron Bentley aaron.bentley at utoronto.ca
Thu Jul 21 00:14:52 BST 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John A Meinel wrote:
> Aaron Bentley wrote:

> Well, you could have a SmartTransport that coupled with a SmartStorage,
> such that it could fulfill the standard get()/put() requests, but would
> also have advanced knowledge. There would be tighter coupling between
> Storage & Transport in that case, but it is probably okay.

Okay, but why is there value in combining high-level operations with
low-level ones?

Transports operate in terms of files and directories.  Smart servers
operate in terms of revisions, inventories, trees, lines of ancestry,
etc.  There's not a whole lot of overlap, so I don't see an advantage in
combining the two.

> I'm thinking you could have:
> bzr://some/sort/of/path
> Which would instantiate the SmartTransport, which could even connect to
> the smart server, and start asking for files. In Branch.__init__()
> (right now in _check_format()) it would realize that it should
> instantiate SmartStorage instances.
> 
> Would it be okay to have Storage be able to yield a diff?

Right now, Stores have a clean and simple API that's easy to
reimplement.  If it were just one function, I'd probably not care.  But
I think it's actually a lot of functions, if you go down that path.


> To me, there are 2 aspects. First, with a smart server, when you say
> "give me this file" you might have the previous file and want to only
> get the diff, and re-create it on your end. That can be internal to a
> SmartStorage, and SmartTransport (basically smart transport has a
> get_diff, which SmartStorage knows how to use, but it isn't generally
> exported at the Transport level).
> 
> The second aspect, is that frequently a Branch wants a diff, and certain
> storage formats are going to optimize for that. For instance, depending
> on the specific request, revfiles might store exactly the diff that you
> want, no need to re-create it.

I suppose that if Revfiles made diffs easily accessible, there would be
a case for this.  But bear in mind that revfiles can only produce a few
diffs for a text (diff from an arbitrary parent, diff to any
descendant), so their value may be limited.

So here's another way you can do it:

SmartBranch can have the methods store_get(self, namespace, id),
store_put(self, namespace, id, stream) and text_diff(self, from_id, to_id)

SmartStore.__getitem__ can be implemented in terms of
SmartBranch.store_get and SmartBranch.store_put.

e.g. SmartBranch.__init__ would include:

        self.inventory_store = SmartStore(self, "inventory")

That's even simpler than the other Stores.

SmartBranch.get_text_diff could be invoked if the user didn't specify
any special diff options.  But it would fail if the it did not have one
of the texts available.  And if the user did supply special options, it
would be bad to run 'diff' remotely, so the interface doesn't permit
special options.

So you'd do something like

def diff_texts(b, from_id, to_id, external_diff_options=None):
    if external_diff_options()
        diff = external_diff(...)
    elif hasattr(b, "text_diff"):
        try:
            diff = b.text_diff(from_id, to_id)
        except IDNotPresent:
            diff = None
    else:
	diff = None
    if diff is None:
        diff = internal_diff(...)

A SmartBranch.revision_tree could produce a ChangesetTree.
SmartBranch.append_revision could operate server-side.  It could iterate
through all a revision's acestors on the server side.

> That's why I was wondering where the weave stuff was going. Are the
> higher level operations going to *require* a weave? So that if you are
> using the "CompressedTextStore" doing a merge requires it to rebuild the
> weave?

annotate and weave merge will probably require weave data.  But for
weave merge, you only need to go back as far as the most recent common
ancestor(s), so it's not that bad.  bzrtools provides an annotate that
works with CompressedTextStores, but its performance over a network
connection would be hideous.

> My thought was to have get_partial() take a list of files and ranges,
> something like:
> 
> def get_partial(self, ranges):

Yes, that could work nicely.

>>I don't think transports should not be concerned with diffing.  I'd say
>>that's a branch-level concern.
> 
> 
> I generally agree. And I think SmartStorage can be more intelligent
> about a SmartTransport, but it doesn't have to be exposed at the
> Transport layer.
> 
> But would it be okay to expose it as part of the Storage layer? Possibly
> with a way of doing "storage.get_diff()" returning 'I don't have it,
> build it yourself'.

Again, I don't see why we'd want to include diffs in the Store
interface, unless CompressedTextStore and/or WeaveStore actually did
produce diffs.

> I am tempted to pass get_diff a function that can generate the diff from
> 2 texts in the case that the storage doesn't already have it.

I'd be inclined to do it the other way: if branch.get_diff(old_id,
new_id) throws an IDNotPresent, then you fall back to internal_diff.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFC3trr0F+nu1YWqI0RAod5AJ4o2HokSsP7qjrknIy9a6g5yLyccACeJjbb
Pov+68SKXX1tJB3uw9M4tVY=
=l0B2
-----END PGP SIGNATURE-----




More information about the bazaar mailing list