BTree + CHK Inefficiencies

Sat Aug 7 17:50:23 BST 2010

On Sat, Aug 7, 2010 at 6:35 AM, Gordon Tyler <gordon.tyler at gmail.com> wrote:

> My other thought was to separate packs by the size of the content. i.e.
> files < 1MB go in one pack and files > 1MB go in another, maybe with a
> few more levels as necessary. The small files which change a lot in
> comparison to the large files wouldn't cause repacking of the large files.
>
>
FWIW, we do something like this in our own code.  A single "project" created
by of our software is typically tens of gigabytes.  Ideally, projects would
be broken up into dozens of files, some big, some small, for fast, easy
access.  But users need to able to re-open old projects and share projects
with their collaborators.  So they demand One Big File.  And we compress it,
in chunks.  We use a couple of acceleration methods to minimize the amount
of unchunking we have to do before our GUI can continue interacting with the
user.  First, we try to organize the chunks in a logarithmic size
progression.  In some cases -- and this can have a huge impact on
response-time -- we chunk together files in a logarithmic progression of
access frequency.  We have an advantage that bzr does not -- we can guess
pretty well which files will be needed early and often, since our app
provides a controlled environment.  So we never repack, although we have
found a few cases where it might help.  I'm not claiming anything special.
I'm pretty sure these ideas would seem quaint to database and compression
gurus.  Whether these are of any value in bzr I do not dare guess.

~M
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.ubuntu.com/archives/bazaar/attachments/20100807/4bb25b03/attachment.htm