[RFC] Bundles as repositories

Robert Collins robertc at robertcollins.net
Tue Jun 19 05:21:12 BST 2007


On Mon, 2007-06-18 at 22:52 -0400, Aaron Bentley wrote:


> Robert Collins wrote:
> > On Fri, 2007-06-15 at 03:20 -0400, Aaron Bentley wrote:
> 


> > Checking of preludes is conceptually hard simply because of line
> > endings: email transmission will munge line endings and thus prelude
> > checking cannot be binary based; it has to do whitespace tolerant diffs
> > and other such complications.
> 
> I think it's conceptually quite simple.  It's like doing
> case-insensitive comparisons by turning both inputs into lowercase.
> 
> Do the absolute maximum damage that can be done to it via whitespace
> munging, then record the sha1 sum of the munged prelude.
> 
> Apply the same damage on the other side, and compare the resulting sha1.

I'll be interested to see if this works out in practice. Just hope noone
uses a whitespace only language through this :).

> > If the patch content is going to be shown
> > in another fashion (for instance pull can generate a patch as it goes to
> > show - I think we've had this requested as a feature) then checking the
> > prelude is duplicate effort.
> 
> No it's not, because even if people have that feature turned on, they
> may not pay close attention to the diff that pull produces.

I don't see grabbing from a URL where someone looked at the bzr branch
with 'loggerhead' as *any* safer than grabbing from the URL of a bundle.
In both cases its possible for the data the human looked at, and the
data the computer received to differ. The only real way to avoid this is
to have a top-level hash that can be used to check validity after pull
or whatever completes.

> > The first thought that comes to mind is that the data section of the
> > bundle should always be binary only; that is the data shouldn't change
> > if you have or dont have a prelude (this is why I've been calling it a
> > prelude - it comes before :)). This would make checking preludes
> > something that cannot be disabled by toggling a flag in the content - it
> > will always happen according to whatever policy we have agreed on/the
> > user has set.
> 
> I propose we have two variants of the format.
> 
> One variant is for human consumption, has a prelude and a base64 wrapper
> on the data, and has its prelude checked by default.
>
> The other is not for human consumption, has no prelude, and is not
> base-64 wrapped.  That way, it's hard to mistake one format for the other.
> 
> The data, whether base-64 wrapped or not, does not change.

I think this is a reasonable approach - to be sure I got you, you are
proposing that we 'couple prelude-presence with base64 wrapping of the
pack file?'

> >>> AIUI we want bundles to have the following properties:
> >>>  - compact representation
> >>>  - able to be used without their contained data being added to
> >>> repositories
> >> ^^^ This was not one of my goals.
> > 
> > Do you object to it being a goal?
> 
> If we're talking about the 1.0alpha format I've been working on, yes.
> Doing that before merging it would probably mean missing the release
> window for 0.18.

Oh, sure - its really hard to do now as we don't have many prerequisites
in place. I'm writing for the roadmap though, so we should include our
intent for <= 0.22.

> As another format, or future work, that would be fine.  However, there
> is tension between the desire for compactness and the desire to use them
> in-place, because extraction speed with no snapshots would be glacial.

I think there is a tradeoff yes. However if we can use the core
repository code directly on the bundle, its a tension that will be
reflected in the very core of bzr and should be quite managable there.

> >>>  - fast to create
> >>>  - fast to extract data from
> >> I'm trying to accomplish fast installation, (e.g. of knit records), not
> >> fast extraction of fulltexts.  And I'm specifically choosing size over
> >> speed, because of how bundles are usually used.
> > 
> > I think these goals are aligned; fast installation if you do not ship
> > ready-to-use repository data (e.g. knit gz hunks) implies creating a
> > fulltext and doing a regular knit insert as quickly as possible. Unless
> > I've missed something:).
> 
> Single-parent MPDiffs ought to be easy to convert into knit deltas
> without extracting any fulltexts.  You'll pay the cost of gzipping, but
> not of file comparison.

If its easy to turn into a knit delta then its easy to use as one too -
without paying the knit unzip cost :).

> And heck, I haven't ruled out bundling knit hunks either.
> 
> > I'm also thinking that it would be nice if everything for a merge
> > directive could live in-branch ready to be used.
> 
> I'm confused.  As opposed to living in the repository?

I mean the 'commit message to use' and 'branch to submit to' data being
stored in the branch. So that 'bzr merge-directive' can do the entirely
right thing with no questions asked - and be exposed via the branch api
when its in the bundle.

> > I think we can do that based on the percentage of the file we've read
> > both more accurately than a number of items count; and without forcing a
> > pre-calculation step to bundle creation.
> 
> I don't know that it would be more accurate.  It's not uncommon for a
> bundle prelude to comprise 75% or more of the bundle file.  The actual
> data will be more expensive to read, I assume.

So, read the prelude, subtract that size from the total (if you have
one). Now start counting as a %.

> > Generally speaking I think we
> > want to move away from percentage indicators and towards 'amount of work
> > done' reporting; where we happen to have more information lets display
> > it, but lets not do excess work [unless its key to that part of the UI].
> 
> It's not important now, but I'd like to get a better idea what you mean
> later.

Ok, I'll throw up another thread asap.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070619/5c63fe67/attachment-0001.pgp 


More information about the bazaar mailing list