Transport w/ delta / offset

Wed Jul 20 17:32:47 BST 2005

Aaron Bentley wrote:
> John A Meinel wrote:
>
>>>I was thinking about what would be possible for a smart server
>>>implementation, and whether it could be done with the current Storage
>>>and Transport layers.
>
>
> I don't think it can.  It mars the otherwise-clean separation of
> concerns.  I think it makes more sense to have a SmartBranch.
>
> Another option would be to introduce a new BranchStorage interface that
> Branch uses, and can be satisfied with either a DumbBranchStorage
> interface that uses Transport and Store objects or a SmartBranchStorage
> that uses a smart server.
>

Well, you could have a SmartTransport that coupled with a SmartStorage,
such that it could fulfill the standard get()/put() requests, but would
also have advanced knowledge. There would be tighter coupling between
Storage & Transport in that case, but it is probably okay.

I'm thinking you could have:
bzr://some/sort/of/path
Which would instantiate the SmartTransport, which could even connect to
the smart server, and start asking for files. In Branch.__init__()
(right now in _check_format()) it would realize that it should
instantiate SmartStorage instances.

Would it be okay to have Storage be able to yield a diff?

To me, there are 2 aspects. First, with a smart server, when you say
"give me this file" you might have the previous file and want to only
get the diff, and re-create it on your end. That can be internal to a
SmartStorage, and SmartTransport (basically smart transport has a
get_diff, which SmartStorage knows how to use, but it isn't generally
exported at the Transport level).

The second aspect, is that frequently a Branch wants a diff, and certain
storage formats are going to optimize for that. For instance, depending
on the specific request, revfiles might store exactly the diff that you
want, no need to re-create it. Or for weave files, the way a diff is
stored and produced would be different.

That's why I was wondering where the weave stuff was going. Are the
higher level operations going to *require* a weave? So that if you are
using the "CompressedTextStore" doing a merge requires it to rebuild the
weave?
In some ways, I'm okay with it. I would prefer to not be too dependent
on the specific storage layer. Because there are alternate reasons to
prefer different storage mechanisms.

>
>>>Or possibly the Revfile format.
>>>Specifically, Revfile would want to get portions of a file, rather than
>>>reading in the whole thing. So probably the Transport layer needs to
>>>have functions for 'getting' only a portion of a file. The local
>>>filesystem and http both would support just getting a portion of a file.
>>>You could get() the index files, and then get_partial() the pieces of
>>>the revfile.
>
>
> Yes, Martin and I discussed reading revfiles that way at UDU.  It
> doesn't look like the revfile format is going to be used, though.  And
> the current weave format requires reading the whole file.  There's
> nothing wrong with get_partial if it's useful, but since you'll want to
> read multiple portions of the revfile, it would be really good for it to
> support batch operation.

My thought was to have get_partial() take a list of files and ranges,
something like:

def get_partial(self, ranges):

it could be done 2 different ways, one allowing multiple ranges per
file, the other just 1 range per file, but the same file can be repeated:

get_partial([('f1', (10, 20), (30, 40)),
	('f2', (100, -1))])

or
get_partial([('f1', 10, 20), ('f1', 30, 40), ('f2', 100, -1)])

# -1 (or possibly None) could be used to signal the rest of the file

The former makes it a little bit more obvious for the Transport as to
what should be bundled together, while the later is less nested, and you
can easily just loop over them and return them 1 at a time (with a
generator).

For LocalTransport, it would have to re-open the file anyway (so that it
can access the different ranges), or possibly just return StringIO
objects if the range size is below a certain threshold.

For HttpTransport, is it possible to ask a server for several ranges, or
do those show up as multiple requests? I'm guessing it is the latter,
but I don't know Http 1.1 very well.

>
>
>>>For a smart server, you would frequently only want to request a delta,
>>>and it seems foolish to transmit the full text, just to compute a delta.
>>>And if you are using a revfile format, you wouldn't want to unpack 2
>>>full versions, just to combine them back into a delta.
>>>
>>>The problem is that the Storage layer is where you know what to Diff
>>>against, but the Transport layer is really where you should be doing the
>>>delta.
>
>
>>>Does this sound reasonable, or is it putting too much intelligence in
>>>something that is supposed to be low level?
>
>
> I don't think transports should not be concerned with diffing.  I'd say
> that's a branch-level concern.

I generally agree. And I think SmartStorage can be more intelligent
about a SmartTransport, but it doesn't have to be exposed at the
Transport layer.

But would it be okay to expose it as part of the Storage layer? Possibly
with a way of doing "storage.get_diff()" returning 'I don't have it,
build it yourself'.

So that the Storage layer would only return diffs that it had stored,
and wouldn't really worry about how they had been created.

I am tempted to pass get_diff a function that can generate the diff from
2 texts in the case that the storage doesn't already have it. That way
it is a simple function call, but we can make the simple function call
in Branch, rather than Storage.

>
> Aaron

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050720/eaf72f4f/attachment.pgp