Transport w/ delta / offset

Thu Jul 21 03:51:16 BST 2005

Wait-- is this about how we talk to the server, or how the server itself
is implemented?

I don't see how it really makes sense if you're talking about
implementing a smart server.

But in terms of communicating with a smart server, talking about files
and directories isn't 'smart'.  When a server is smart, it should hide
all the details of how it stores and retrieves data.  Instead of talking
about these things that the client doesn't ultimately care about, it
should talk about the things the client does care about, like texts,
revisions, inventories, ancestries, etc.  I'd say the protocol for a
smart server should bear a close resemblence to the api of Branch.

Since there are other protocols that let us talk about files and
directories, it's redundant to invent a new one.  When you have a smart
server, you can vary underlying storage implementation at will.  You can
even have caching smart server proxies.

> You could even talk to the server over http, which would mean you could
> potentially not change the transport.

You mean using query URLs?  Updating the smart server would require
POST, and maybe PUT.  You can do this, but at any rate, you're building
another protocol on top of http.

> Sort of true. But at the same time if you want to utilize the
> effectiveness of weave or revfile storage, you need to expose that
> functionality.

See my later comments about transforming weave data into a diff.  I
don't think it's as easy as you think.

> What was your idea for enabling weave merge? I don't think you want to
> unpack all of the weave files into full revisions.

I don't know of a general way that weaves can be retrieved efficiently
using a dumb server.  For files that are identical remotely and locally,
you can use rsync or something.  For diverged branches, it's hard.

Using a smart server, it's easy.  Just get it to send you changesets for
all the revisions you don't have.

> Perhaps storage then
> needs a "get_weave(file_id, revision_ids)"
> 
> Yes, it complicates the Storage interface, but you have to do something
> if you want to save it in a certain format, and then retrieve it.
> 
> 

> You are missing an obvious application, though. If I am
> branching/pulling a huge series of revisions, it would be nice to pull
> them all as diffs rather than pulling the full text for each one.

Yes, that would be nice.

> Now, we do have the copy_multi interface, which can determine who
> "other" is, to make the copying faster. Except it makes all the Storages
> need to know the specifics about eachother. Rather than giving them a
> generic interface that they can work through.

So if revfiles were common, I'd probably agree.

>>SmartStore.__getitem__ can be implemented in terms of
>>SmartBranch.store_get and SmartBranch.store_put.
> 
> 
> Why would you have Store use a Branch level interface?

Given my druthers, I wouldn't have stores in a SmartBranch at all, since
they're redundant with SmartBranch functionality, and expose a
lower-level interface.  But revision_store, inventory_store and
text_store are public members of Branch, which makes them a required
part of the Branch API.  This is a fairly easy way to support them in
that context.

 Isn't that
> backwards? Doesn't branch use store?

Whatever way works best.  I can't see why the implementation of Branch
has to use Store, as long as it satisfies the interface.

> It seems like you are trading adding a little bit of complexity to the
> Storage layer, for inverting the hierarchy.

I see it as a backwards-compatibility hack.

> There are ways to optimize annotate access, even with
> CompressedTextStores, perhaps something like Tom's 'revision changed
> bits'. (I don't think his idea is extremely well fleshed out, but it
> might turn into something)

True.  One option is to produce a weave the first time you annotate.

> Any preference on the 'multiple ranges per file' versus being able to
> request the same file multiple times.

I think I would prefer the option to request ranges from multiple files
at once.  I'm not sure what the return type should be.  An iterator of
file-like objects?

>>Again, I don't see why we'd want to include diffs in the Store
>>interface, unless CompressedTextStore and/or WeaveStore actually did
>>produce diffs.
>>
> 
> 
> Well, it seems like WeaveStore needs to at least provide an in-memory
> Weave, and the weave format would allow for producing diffs different
> from getting plain text and running diff.

Maybe.  I'd be a little nervous about transforming weaves into diffs,
though.

> 
> For instance, to get the diff from the previous version, you just pull
> out whatever changes occured in this version.

I think you
1. find out what lines were active in the previous version
2. find out what lines are active in this version
3. compare the two
4. for lines that were added or deleted, or for context lines, get the text
5. synthesize patch hunks.

It's possible that weave stores the comparison, which would let you skip
steps 1-3.  It may be very inefficient to get the text of the context
lines, since they can occur anywhere in the weave.  I am not sure
whether the text of deleted lines is included with their delete instruction.

>>I'd be inclined to do it the other way: if branch.get_diff(old_id,
>>new_id) throws an IDNotPresent, then you fall back to internal_diff.
> 
> 
> Well, diff_texts then becomes the simple interface. You always have a
> simple interface somewhere, it just depends what makes the most sense.

Given the choice, I tend to put the simpler interface on objects that
are used by other objects, rather than objects that are used directly by
client code.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFC3w2k0F+nu1YWqI0RAgYPAJ48vpQqYYQnKYtA1Cy85sXz0bAUeQCdG2K3
1s3chsg1UUxqUVYbuwRCJXI=
=TJeC
-----END PGP SIGNATURE-----