Low level storage update status

Thu Jul 31 07:57:51 BST 2008

So I've been doing some experimentation on real-low level storage
changes, aided by John and with various other folk doing testing and
feedback.

I think now is a good time to summarise what we've got on-hand, and what
I think are reasonable things to aim for in the 1.7 cycle, and beyond.

Index layer
-----------

We have a B+Tree based index layer that performs much better than the
existing index facilities overall. It has optional bloom filters though
I'm not currently convinced that they fit our use case well enough to
want to use them.

Pros:
 It is memory capped during insertion and extraction of data which will
reduce swapping (at the cost of re-reading data, but its better to do
that than to thrash with a garbage-collector keeping everything hot). It
performs less IO to find a given key, and the indices are about 30% of
the size of our current indices.

Cons:
 Insertion is slightly slower than with the current GraphIndex; the
current GraphIndex is simpler to output trivially from python, and the
lack of concern for memory use leads to better performance on small
indices. (When you start thrashing the B+Tree index performs better). I
have a test that drops weeks of work down to a few hours for very large
indices :).

Status:
 Basically done, there is a test repository format using this and it
would be probably a days work to create a bzr merge request for this
which would incorporate this into bzr core; strip out bits we are not
using etc. Doing this into a development format is definitely worth
doing.

Decisions needed:
 - use blooms or not

Compression Layer
-----------------

I've created a thing called GroupCompress which is quite different from
knits and can save significant space as well as simplifying stacking and
tree building. The basic concept is to compress a series of texts into
one stream rather than a series of incremental patches. This means that
a series of texts ABC, AC, ABC, BA only needs one copy of the ABC
content, and after that can just emit references. I'm working on clean
integration with the core, but John has tested reductions on bzr itself
from 85MB to 31MB in size.

Conceptually, a gc repository should have the same access speed for
individual revisions (because revisions don't compress much against each
other its worth questioning whether we want them in a group or
individually). Inventory and text access should be faster both on
average and for specific texts. Annotation will be slower because
annotation depends on both text access speed and on cached data we reuse
from knits.

Pros:
 - Smaller size: This has numerous implications; initial clone should
   get 3 times faster for bandwidth limited situations, its more cache
   friendly, nicer for hosting services.
 - Simpler text extraction: Rather than up to 200 index queries to get
   a given text out, gc repositories do a single query per text. This is
   on average 1% of the index layer overhead. Further to that text 
   extraction does not need to hold as many objects in memory which 
   should drop peak memory usage, particularly for operations like
   annotate and tree building.
 - More robust: By not depending on deltas in seperate packs, and 
   (probably) doing full sha validation on every push and pull, we will
   catch bit-errors earlier than we do today. Performance considerations
   may make this point change between now and having a merge-ready bzr
   branch.

Cons:
 - We can't just copy deltas around as we do today, because deltas are
   part of a stream rather than separate objects. This has ramifications
   on fetch, pack and autopack. I believe that the tradeoffs are in 
   favour of groupcompress, but its not a no-brainer. We can reuse some
   parts of a stream (e.g. the prefix of a stream) intact which helps
   network operations, but packing probably will want to recompress
   most of the time.
 - annotate will become slower in a simple implementation because the
   get_matching_blocks acceleration we have on knits is not available

Status:
 GroupCompress is in early days of development. Right now the basic
concept has been proven and we have a promising outer-envelope for
performance. We don't have enough of the rough edges in integration
fixed to start doing useful comparisons with knit based fetches or tree
building today. There only a few such core bugs left to fix (such as
reverse-topological sort as an option, and knits buffering waaay to much
making conversion slow).

Decisions needed:
 - hack up xdelta?
 - serialization tweaks to optimise C based decompressor
 - zlib data-limit approximation

I intend to keep plugging away at these formats over the next couple of
weeks. What would be useful for me is folk to test the gc plugin and
file bugs tagged groupcompress and btree if they have trouble; Running
usertest over a gc-plain repository would be interesting though right
now probably not terribly surprising :).

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080731/44dcde70/attachment-0001.pgp