[RFC] Cheap re-pack?

John Arbash Meinel john at arbash-meinel.com
Thu Sep 6 16:04:16 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> Hi all,
> 
> With the upcoming pack format, a re-pack is scheduled for every 10
> commits.  AIUI, that's an expensive operation, because it requires
> generating deltas for 10 commits at re-pack time.
> 
> Instead, we could amortize the cost across the 10 commits, by
> calculating the deltas at commit time.  Since packs use reverse deltas,
> you would calculate the delta from the revision you're committing to its
> leftmost ancestor.  So the pack generated by committing revision 10
> would also contain the delta from 10 to 9.  The pack for 9 would also
> contain the delta to 8.  Etc.
> 
> This would mean that repacks would be filtering operations: You read the
> deltas for 9 revisions and the fulltext for 1 revision, and you write
> those to a new pack.  That would make them IO-bound, not CPU-bound.
> 
> Alternatively, if deltas were too big, we could just cache the
> get-matching-blocks output at commit time, and calculate the deltas at
> repack time.
> 
> Aaron

Interesting thought.

However, repack may be much more involved.

I don't think we want to restrict packs to only be ancestral. I know that *I*
found ancestral to be the best compression (by a fairly significant margin),
but there are reasons to try different delta arrangements. It would be nice if
we could put that into place without having to rewrite all of the pack logic.

Just something to think about. I think you might have a good idea, but we may
also end up spending a significant amount of time computing these new deltas,
making average commit slower.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG4BbvJdeBCYSNAAMRAvxKAKDTggxaHt6d4/qMAQoRs0S8oGICnQCfVv97
0Rcp/7nQPQ25JaE/dQAnico=
=dNEu
-----END PGP SIGNATURE-----



More information about the bazaar mailing list