[RFC] compression-grouping records in pack files.
Aaron Bentley
aaron.bentley at utoronto.ca
Thu Jun 21 14:06:33 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi all,
It occurs to me that, for bundles at least, there would be advantages to
compressing a series of records as a group.
Compression has a disadvantage: it hinders random access. Without
random access, it is hard to have bundles be efficient, yet behave like
a repository.
But with delta-compressed versions of files, we rarely need true random
access. To build a version, we need all its prerequisites. Those comprise
1. A snapshot
2. all the ancestors of that version which follow the snapshot.
So for knit-like storage, I imagine a layout like this:
=================
CompressionGroup1
- -----------------
snapshot 1
.................
diff 2
.................
diff 3
.................
diff 4
=================
CompressionGroup2
- -----------------
snapshot 5
.................
diff 6
.................
diff 7
We would then accomplish reasonable compression, because the diffs are
closely related to each other, and to the snapshot.
But at the same time, we hardly compromise any random-access
capabilities, because reading, say, diff 6 without reading snaphot 5
is not very valuable.
This would allow compression without obscurity, so it would avoid the
need to compress and decompress whole pack files-- the pack files could
stay in their most readable format, but most of their data would be
compressed.
When iterating through records, it would be nice if compression-groups
were not iterated through directly, only the records they contain.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGenfY0F+nu1YWqI0RAnTTAJ9G+BC8oSIbvRZETxscpisg7lGijgCgiOhu
6QeDCjryMcB30LQgkDgGqMA=
=yMdj
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list