[RFC] Bundles as repositories

Robert Collins robertc at robertcollins.net
Tue Jun 19 02:59:11 BST 2007


On Fri, 2007-06-15 at 03:20 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> > I think this has been roughly discussed, but wanted to be sure it had..
> > 
> > I think we want a compact binary bundle format, with the human text
> > overall delta just an ignored-prelude. We can check the prelude matches
> > the binary data when processing, or encourage people to run the bundle
> > through bzr
> 
> This is a good description of what I'm working on now, but I think that
> checking preludes should always be done.

I think that checking depends on the context.

Places to check the prelude are when we can reasonably expect that its
being used as 'previewed data':
'bzr patch BUNDLE' or 'cat BUNDLE | bzr patch'

When its being used as any foreign branch is used then we should not
need to check the prelude:
'bzr pull BUNDLE'
'bzr missing BUNDLE'
'bzr merge BUNDLE'
'bzr diff BUNDLE'

Checking of preludes is conceptually hard simply because of line
endings: email transmission will munge line endings and thus prelude
checking cannot be binary based; it has to do whitespace tolerant diffs
and other such complications. If the patch content is going to be shown
in another fashion (for instance pull can generate a patch as it goes to
show - I think we've had this requested as a feature) then checking the
prelude is duplicate effort.

> > ... Alternatively we could have no human readable section but
> > a specific mime type so mutt etc can get bzr to format it for human
> > reading.
> 
> I worry about the potential for mischief.  People could create a
> binary-only bundle that looked like it had a prelude that would be
> validated, when in fact, it wouldn't be validated.  And the user
> wouldn't see that.
> 
> For binary-only bundles, is there a need for base64?

If I mail a binary bundle to you, and you save it to disk then use it
from there then no. Thats an interesting attack you describe there. A
similar attack was found against many gnupg using applications recently.
What can we do about it?

The first thought that comes to mind is that the data section of the
bundle should always be binary only; that is the data shouldn't change
if you have or dont have a prelude (this is why I've been calling it a
prelude - it comes before :)). This would make checking preludes
something that cannot be disabled by toggling a flag in the content - it
will always happen according to whatever policy we have agreed on/the
user has set.

> > AIUI we want bundles to have the following properties:
> >  - compact representation
> >  - able to be used without their contained data being added to
> > repositories
> 
> ^^^ This was not one of my goals.

Do you object to it being a goal?

> >  - fast to create
> >  - fast to extract data from
> 
> I'm trying to accomplish fast installation, (e.g. of knit records), not
> fast extraction of fulltexts.  And I'm specifically choosing size over
> speed, because of how bundles are usually used.

I think these goals are aligned; fast installation if you do not ship
ready-to-use repository data (e.g. knit gz hunks) implies creating a
fulltext and doing a regular knit insert as quickly as possible. Unless
I've missed something:).

> > To me it makes sense that bundles should then be a
> > 'branch-and-repository-in-a-file':
> >  - one with the derivable data discarded
> >  - one where the data within it has been stored in the most compact
> > representation
> >  - one that only contains partial history data
> 
> Well, actually making them a branch is a new idea to me.  With the
> Branch API as fat as it is, I don't know if that'll really be
> productive.  It would work best if it was readonly, and if we only had
> to simulate the control files, it might be manageable.

Right. We probably want to split out the interface like we have Tree and
MutableTree to accomodate RevisionTree and WorkingTree variations.

I'm also thinking that it would be nice if everything for a merge
directive could live in-branch ready to be used. 

> > The actual representation might look something like:
> > the bundle is a pack file.
> > The pack contains records named:
> > branch-data
> >   tip revision id
> 
> ^^^ not strictly necessary-- the tip is already provided as "target
> revision" with bundles.  But Branch6 stores tip revision_id *and revno*,
>  so storing that would make sense.

Right - this is minor detail, either way it can implement this part of
the Branch API as long as some composite repository is available.

> >   branch nick
> 
> ^^^ Wouldn't it be better to just put the whole config file in a record?
>  With dirstate-tags, the config already controls 90% of the behavior.

Sure.

> > branch-tags
> >   tag ...
> > [repository data here, representation still unclear to me, though I'm
> > sure Aaron has some solid thoughts on this as he has been hacking on
> > something that is conceptually very similar]
> 
> Current representation is
> - - multiparent diffs of file texts, named like so: file:revisionid/filid
> - - multiparent diffs of inventories, named like so: inventory:revisionid
> - - fulltexts of revisions, named list so: revision:revisionid
> - - testament of the tip revision
> 
> Support for revision signatures should come in the next day or two.
> 
> Also, a table-of-contents, or at least a count of the number of records
> is probably a good idea, so we can indicate progress more easily.

I think we can do that based on the percentage of the file we've read
both more accurately than a number of items count; and without forcing a
pre-calculation step to bundle creation. Generally speaking I think we
want to move away from percentage indicators and towards 'amount of work
done' reporting; where we happen to have more information lets display
it, but lets not do excess work [unless its key to that part of the UI].

> > If this is agreeable, I'll create a design document trying to ensure we
> > have solid motivation and so on documented.
> 
> Yeah, generally speaking, it feels like a good move.

I'm writing up what I put forward, with changes inspired by your reply
now.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070619/0806dabd/attachment.pgp 


More information about the bazaar mailing list