[RFC] compression-grouping records in pack files.

Aaron Bentley aaron.bentley at utoronto.ca
Thu Jun 21 14:06:33 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

It occurs to me that, for bundles at least, there would be advantages to
compressing a series of records as a group.

Compression has a disadvantage: it hinders random access.  Without
random access, it is hard to have bundles be efficient, yet behave like
a repository.

But with delta-compressed versions of files, we rarely need true random
access. To build a version, we need all its prerequisites.  Those comprise
1. A snapshot
2. all the ancestors of that version which follow the snapshot.

So for knit-like storage, I imagine a layout like this:

=================
CompressionGroup1
- -----------------
snapshot 1
.................
diff 2
.................
diff 3
.................
diff 4
=================
CompressionGroup2
- -----------------
snapshot 5
.................
diff 6
.................
diff 7

We would then accomplish reasonable compression, because the diffs are
closely related to each other, and to the snapshot.

But at the same time, we hardly compromise any random-access
capabilities, because reading, say, diff 6 without reading snaphot 5
is not very valuable.

This would allow compression without obscurity, so it would avoid the
need to compress and decompress whole pack files-- the pack files could
stay in their most readable format, but most of their data would be
compressed.

When iterating through records, it would be nice if compression-groups
were not iterated through directly, only the records they contain.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGenfY0F+nu1YWqI0RAnTTAJ9G+BC8oSIbvRZETxscpisg7lGijgCgiOhu
6QeDCjryMcB30LQgkDgGqMA=
=yMdj
-----END PGP SIGNATURE-----



More information about the bazaar mailing list