[merge] document the pack format

John Arbash Meinel john at arbash-meinel.com
Wed Oct 24 20:48:20 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

James Westby wrote:
> On (24/10/07 10:53), Martin Pool wrote:
>> +obsolete_packs/      packs that have been repacked and are no 
>> +                     longer normally needed
> 
> When do packs enter here? When are they removed?
> 
> I am guessing this is in a repack operation. Does it just build up cruft?
> 
> There was a recent git conversation where they realised that all
> operations need to be automatic by default, as only 0.1% of developers
> will ever actually run the maintenance commands, if they even realise
> they are supposed to.
> 
> I applaud the automatic repack then, I would just be wary of the
> (only disk?) overhead that would arise from too much accumulating here.

It amounts to log(NumRevisions) overhead.
So if you have 10 revisions, you end up with 2 copies of all of your data.
If you have 100 revisions, it is 3 copies, and so on.

However, this is only for things that build up over time. A plain "bzr push"
will create 1 copy, and it will be rare that you push enough to make a huge
difference there. (If you have a 1000 revision pack, and a few 10 revision
packs, the 10 revision ones may get repacked, but the big one will probably be
left alone for a long time.)

Robert did open a bug about it.

At the moment we have to be a little careful, because when you auto-repack it
will break readers. (Because the file that they thought existed has now been
moved or deleted.)
There are ways to minimize this (readers can know that they will be moved to
this new location, and just look there, or they can reset their names list and
re-search for the entry they were looking for. The data isn't deleted just moved.)


> 
> Along the same lines, I guess a repack operation will still no nothing
> for unreferenced revisions, as they repository doesn't know about the
> branches, and packs don't change that.
> 
> Thanks,
> 
> James
> 


Correct. In a shared repository you still don't know what revisions are or are
not referenced, unless you go out and check for branches.

We are still planning on having a manual repack step, which can optimize the
disk layout more. The current autopack is meant to be generally "cheap" and
just move the chunks a bit, a full repack could reorder patches to favor new
revisions versus old revisions, etc.
We certainly could add a '--garbage-collect' flag to the repack which could do
the search for what entries are/are not referenced.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHH6GDJdeBCYSNAAMRAhsuAJ9fCeHlqmA5G+7wX5YjFIq3c4wUcACgopbL
y1lplmGCSBuCeyBykQ3BN0c=
=BWPg
-----END PGP SIGNATURE-----



More information about the bazaar mailing list