On Sat, Aug 7, 2010 at 6:35 AM, Gordon Tyler <span dir="ltr"><<a href="mailto:gordon.tyler@gmail.com">gordon.tyler@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
My other thought was to separate packs by the size of the content. i.e.<br>
files < 1MB go in one pack and files > 1MB go in another, maybe with a<br>
few more levels as necessary. The small files which change a lot in<br>
comparison to the large files wouldn't cause repacking of the large files.<br><br></blockquote><div><br>FWIW, we do something like this in our own code. A single "project" created by of our software is typically tens of gigabytes. Ideally, projects would be broken up into dozens of files, some big, some small, for fast, easy access. But users need to able to re-open old projects and share projects with their collaborators. So they demand One Big File. And we compress it, in chunks. We use a couple of acceleration methods to minimize the amount of unchunking we have to do before our GUI can continue interacting with the user. First, we try to organize the chunks in a logarithmic size progression. In some cases -- and this can have a huge impact on response-time -- we chunk together files in a logarithmic progression of access frequency. We have an advantage that bzr does not -- we can guess pretty well which files will be needed early and often, since our app provides a controlled environment. So we never repack, although we have found a few cases where it might help. I'm not claiming anything special. I'm pretty sure these ideas would seem quaint to database and compression gurus. Whether these are of any value in bzr I do not dare guess.<br>
<br>~M<br><br></div></div>