Plans for post 0.8

Aaron Bentley aaron.bentley at utoronto.ca
Wed May 10 15:57:57 BST 2006


0.8 includes (at least) two major changes to bzr: knits and
repository/checkout workflow.  Feel like real progress toward where we
want to be.

I thought it would be useful to list some of the other major projects
that I can see us working on to improve bzr.

Major achitecture changes
#########################

Nested trees
============
We need a mechanism to support working on projects that include
subprojects.  E.g., a project that consists of a library plus a library
client.  Right now, that mechanism is ConfigManager, but we think nested
trees would be a much easier way to operate.

History Horizons
================
Currently, bzr always makes a full copy of revision history when
branching.  This allows some convenient assumptions for bzr developers,
but is not especially useful for users.  The older a revision is, the
less likely it is to be useful.  And in fact, annotation-based merges
like KnitMerge do not require the basis version to be stored, so perhaps
we can get rid of the need for old revisions in merging.

Shallow checkouts
-----------------
History horizons will let us produce shallow branches that only have the
last ancestor.  That will allow us to do "shallow checkouts", that have
the performance characteristics of "heavy checkouts", but have download
and storage requirements similar to lightweight checkouts.  That could
allow us to get rid of heavy and lightweight checkouts, which would be a
nice UI improvement.

History pruning
---------------
History horizons may also be useful for projects that want to prune
their history, whether due to storage requirements, skeletons in
closets, or whatever reason.

CherryPicking
=============
Monotone's Nathaniel Smith breaks the concept of CherryPicking into two
halves: CherryPatching (that is, applying only some of the changes in a
branch) and CherryRecording (that is, recording which patches have been
applied).  We're the opposite of Arch here: we have merge technology
much better than replay --skip-present, but we don't record cherrypicks,
so we can't take full advantage of this tech.

Async/batched operation
=======================
We can and should optimize knits, and that could make branching take
about 4 minutes.  (See below.)  But to get speeds faster than 4 minutes
without a smart server, we'd need eliminate latency through async and/or
pipelining.  Each of the core developers has had a go at this, so
clearly we all want it.  We just disagree on the right approach.

Less major stuff that's still nice to have
##########################################

Changesets
==========
Changeset support is already written-- we need to update it and get them
into the core.

Patience sequence matching
==========================
All experience so far has shown that the Patience-based matcher is often
better, and never worse than the Python built-in one.  Using it should
improve diff, merge, annotate, and perhaps reduce the size of
repositories slightly.

Knit optimization
=================
My bzr repository has 1663 files in it, and I get a ping time of 152 ms
to bazaar-vcs.org.  If latency is our limiting factor, I should be able
to download a branch of bzr in 4 minutes, 12 seconds over plain http,
without using asyc requests.  Currently, it takes more like 40 minutes,
so there seems to be much room for improvement.

TreeTransform Rollback
======================
TreeTransform will die messily if the transform encounters an error
(like out-of space, permission denied) partway through.  The actual
application of a transform is just renaming and deleting files, so the
operation is relatively straightforward to reverse if an error is
encountered.

It would mean renaming deleted files, and deferring deletion until the
transform had been successfully applied, which would increase the number
of system calls, but system calls do not currently dominate transform
application time.

Merge updates
=============

Identity knit merge
-------------------
I'd like to update knit merge to use annotation information in lieu of
line-identity, with mandatory reprocessing to support convergence.  That
merge should then be used for all VersionedFiles, because the repository
type should not affect the merge behaviour (and weaves are deprecated).

Edge merge
----------
I'd also like to take a stab at edge-based merging, because
1. it can support text movement (i.e. swapped blocks) within a file
   (perhaps utimately it could support text movement among files)
2. Edge annotation can determine which blocks have replaced other
   blocks.  This is useful for --show-base, and also for annotate-style
   "code archaeology"

History-based scalar merge
--------------------------
While we have history-based text merges, our filesytem merges are all
three-way merges, which means they require a base revision, and are
subject to the criss-cross problems that affect three-way merges.

Spurious conflict avoidance
---------------------------
We should also support merging across line-ending changes and
indentation changes.

Summary
#######
We most likely won't have all of these done for our 2.0 release (1.0 is
already taken by the old Bazaar).  I find it useful to write these
things down, and thought I would share them with you.

My priorities are:
1. Changesets
2. Patience sequence matching
3. Nested trees
4. CherryPicking

I'm not claiming any of these for myself, though.  They're the things I
see as important to get done soon, so I'd welcome anyone else doing them.

Aaron




More information about the bazaar mailing list